extracting and ranking product features in opinion documents
DESCRIPTION
This is a presentation we made in the 2012 Spring Data Mining class of Tsinghua University. The presentation is about a paper by Lei Zhang, Bing Liu, Suk Hwan Lim, Eamonn O’Brien-StrainTRANSCRIPT
![Page 1: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/1.jpg)
Extracting and Ranking Product Features in Opinion Documents陈欣,王鹤达,张文昌
![Page 2: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/2.jpg)
A Story
• Retina Display• 3-axis gyro & accelerometer• A4 CPU• Multitask• Face Time• iBook
• Antenna Gate
![Page 3: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/3.jpg)
Why mining product features?
• Clearly knowing the response from consumers will help company win more market share.
• Consumers could also make correct choices when shopping.
![Page 4: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/4.jpg)
Recent Research
• In recent years, opinion mining has been an active research area in NLP. The most important problem is to extracting features from a corpus.
• HMM, ME, PMI,CRF methods.• Double Propagation is a state-of-art unsupervised technique
for solving this problem, though it has its own significant limitations.
![Page 5: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/5.jpg)
Double Propagation
• Proposed by researchers from Illinois University and Zhejiang University.
• Mainly extracts noun features, woks well for medium-size corpora.
• No additional resources but initial seed opinion lexicon needed.
![Page 6: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/6.jpg)
DP Mechanism
Basic Assumption: Features are nouns/noun phrases and opinion words are adjectives.
Dependency Grammar: Describe the dependency relations between words in a sentence, including direct relations(a)(b) and indirect relations(c)(d).
The camera has a good lens.
Class
Opinion
Feature
![Page 7: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/7.jpg)
DP Limitations
Non-opinion adjectives may be extracted as opinion words. This will introduce more and more noise during the extracting process.
current
entireNoun+
Some important features do not have opinion words modifying them.
There is a valley on my mattress.
No opinion word modified feature
![Page 8: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/8.jpg)
Proposed Methods
Two-Step feature mining method:
Feature Extraction• Double Propagation• Part-whole pattern• No pattern
Feature Ranking• New angle to solve the noise problem.• Use relevance & frequency to rank features.
![Page 9: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/9.jpg)
Ranking Principles
• Three strong clue indicates a correct feature:• Modified by multiple opinion words.• Could be extracted by multiple part-whole pattern.• Combination of the part-whole, no pattern and opinion word
modification.
• Frequent appearing indicates an important feature.
![Page 10: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/10.jpg)
Process
• Feature extraction• part-whole relation• “no” pattern
• Feature ranking• HITS algorithm• consider frequency
![Page 11: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/11.jpg)
Part-whole relation
• Ambiguous / Unambiguous• Phrase pattern• NP + Prep + CP• CP + with + NP• NP CP or CP NP
• Sentence pattern• CP Verb NP
![Page 12: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/12.jpg)
“no” Pattern
• no + features• no noise, no indentation
• Exceptions• no problem, no offense• manually compiled an exception list
![Page 13: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/13.jpg)
Apply HITS Algorithm
• HITS Algorithm• hub score / authority score• iteration to optimize
• Apply HITS Algorithm• split feature and feature indicator• use directed graph• compute feature revelance
![Page 14: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/14.jpg)
Feature Ranking
• Utilize feature relevance and frequency• Step 1. Compute authority score using power iteration.• Step 2. Compute final score by
where is the frequency of feature , and is the authority score of feature .
![Page 15: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/15.jpg)
Data Sets & Evaluation Metrics
Data Sets Cars Mattress Phone LCD
# of Sent. 2223 13233 15168 1783
“Cars” and “Mattress”: product review sites.“Phone” and “LCD”: forum sites.
Precision@N metric:Percentage of correct features that are among the top N feature candidates in a ranked list.
![Page 16: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/16.jpg)
Recall & Precision Comparison
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Our RecallDP Recall
Our PrecisionDP Precision
0.560.64
0.44
0.55
0.55 0.54
0.23
0.43
0.78 0.77
0.680.66
0.79 0.79
0.69 0.68
Our RecallDP RecallOur PrecisionDP Precision
Results of 1000 sentences
![Page 17: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/17.jpg)
Recall & Precision Comparison
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Our RecallDP Recall
Our PrecisionDP Precision
0.69
0.66
0.5
0.56
0.65
0.58
0.42
0.52
0.660.7 0.7
0.62
0.7 0.70.67
0.64
Our RecallDP RecallOur PrecisionDP Precision
Results of 2000 sentences
![Page 18: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/18.jpg)
MattressPhone
0.45
0.5
0.55
0.6
0.65
0.7
Our Recall
DP Recall
Our Precision
DP Precision
0.67
0.51
0.59
0.48
0.66
0.62
0.650.64
Our RecallDP RecallOur PrecisionDP Precision
Results of 3000 sentences
Recall & Precision Comparison
![Page 19: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/19.jpg)
Ranking Comparison
Cars
Mat
tress
Phon
e
LCD
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Our PrecisionDP Precision
0.94
0.9
0.76 0.76
0.840.81
0.640.68
Our PrecisionDP Precision
Precision at top 50
![Page 20: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/20.jpg)
Cars
Mat
tress
Phon
e
LCD
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Our PrecisionDP Precision
0.88
0.85
0.750.73
0.820.8
0.650.68
Our PrecisionDP Precision
Precision at top 100Ranking Comparison
![Page 21: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/21.jpg)
CarsMattress
Phone
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
Our Precision
DP Precision
0.80.79
0.760.75
0.710.7 Our Precision
DP Precision
Precision at top 200Ranking Comparison
![Page 22: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/22.jpg)
Conclusion
• Use part-whole and “no” patterns to increase recall
• Rank extracted feature candidates by feature importance, determined by two factors:• Feature relevance • Feature frequency(HITS was applied)
![Page 23: extracting and ranking product features in opinion documents](https://reader036.vdocuments.site/reader036/viewer/2022062418/55506301b4c905ae3f8b5541/html5/thumbnails/23.jpg)
Thank you