sentiment analysis
DESCRIPTION
Slides for my college project on Sentiment Analysis of cellphone reviews. System is to be made in Python and uses NLTK.TRANSCRIPT
Sentiment Analysis from Cellphone Reviews
Sagar Ahire | 155Preeti Singh | 178
What is Sentiment Analysis?
• Takes a block of text as input• Determines the sentiment expressed in it• “Sentiment” refers to whether the author’s
opinion is positive or negative
Disciplines Involved
• Natural Language Processing• Data Mining• Artificial Intelligence
What Sentiment Analysis is NOT
• Does NOT use images anywhere (that is “emotion detection”)
• Does NOT aim at evaluating the product itself, just the sentiment expressed by the reviewer
Why Sentiment Analysis is challenging
• Keywords are not usually direct“This phone is as modern as the one owned by Alexander Graham Bell”
• Opinions expressed may belong to other people“Many people say iPhones are better than Androids”
• Order Effects“This could have revolutionized phones for ever, but the bundled OS makes it an ultimate letdown”
• Colloquial and domain-specific phrases“The phone runs a 1.2 GHz dual core processor”
Project Overview
• Aims to perform sentiment analysis on cellphone reviews
• Rates the sentiment on a scale of 1 to 5 stars
Inner Workings
• Uses a corpus of several cellphone reviews (currently 33)
• Trains a classifier using features, which may be:– Unigrams (Occurrences of single words)– Bigrams (Occurrences in pairs)– Adjectives only, etc.
• Uses the classifier to classify unknown reviews
Steps
Why Python?
• Less code, more productivity• Flexible paradigms (functional, procedural,
object-oriented, all in one)• Fast development cycle• Wide range of modules
Diving In…
• Modules used:– Python Standard Library (random, sys, etc)– nltk
• Classifiers used:– Naïve Bayes
Diving In… The Algorithm(Unigram Occurrences)
1. Take the entire corpus as input2. Create a list ‘l’ of all documents, each labeled
by its category (i.e., no of stars)3. Extract the ‘n’ most frequent words in the
entire corpus, cleaning up duplicates and non-alphabetic words
Diving In… The Algorithm (Unigram Occurrences)
4. For every document in l:i. Create a dictionary d[l]ii. For each of the n frequent words, put a value in
d[l] indicating presence or absence
5. Divide the dictionary into a training set and a testing set
Diving In… The Algorithm (Unigram Occurrences)
6. Train a Naïve Bayes Classifier using the training set
7. Test the classifier using the testing set and report the accuracy
Next Steps
• Investigating the Maximum Entropy Classifier• Refining feature choice– Negation Tagging– Synonyms
• Investigating Regression techniques
Additional Applications of Sentiment Analysis
• Filtering of SPAM or abusive e-mails• Gauging the mood of people in a particular
network• Government intelligence• Psychological evaluation• Recommendation Systems• Display of ads on webpages
“Sentiment is the poetry of the imagination.”- Alphonse de Lamartine