sentiment analysis

Sentiment Analysis from Cellphone Reviews

Sagar Ahire | 155Preeti Singh | 178

What is Sentiment Analysis?

• Takes a block of text as input• Determines the sentiment expressed in it• “Sentiment” refers to whether the author’s

opinion is positive or negative

Disciplines Involved

• Natural Language Processing• Data Mining• Artificial Intelligence

What Sentiment Analysis is NOT

• Does NOT use images anywhere (that is “emotion detection”)

• Does NOT aim at evaluating the product itself, just the sentiment expressed by the reviewer

Why Sentiment Analysis is challenging

• Keywords are not usually direct“This phone is as modern as the one owned by Alexander Graham Bell”

• Opinions expressed may belong to other people“Many people say iPhones are better than Androids”

• Order Effects“This could have revolutionized phones for ever, but the bundled OS makes it an ultimate letdown”

• Colloquial and domain-specific phrases“The phone runs a 1.2 GHz dual core processor”

Project Overview

• Aims to perform sentiment analysis on cellphone reviews

• Rates the sentiment on a scale of 1 to 5 stars

Inner Workings

• Uses a corpus of several cellphone reviews (currently 33)

• Trains a classifier using features, which may be:– Unigrams (Occurrences of single words)– Bigrams (Occurrences in pairs)– Adjectives only, etc.

• Uses the classifier to classify unknown reviews

Why Python?

• Less code, more productivity• Flexible paradigms (functional, procedural,

object-oriented, all in one)• Fast development cycle• Wide range of modules

Diving In…

• Modules used:– Python Standard Library (random, sys, etc)– nltk

• Classifiers used:– Naïve Bayes

Diving In… The Algorithm(Unigram Occurrences)

1. Take the entire corpus as input2. Create a list ‘l’ of all documents, each labeled

by its category (i.e., no of stars)3. Extract the ‘n’ most frequent words in the

entire corpus, cleaning up duplicates and non-alphabetic words

Diving In… The Algorithm (Unigram Occurrences)

4. For every document in l:i. Create a dictionary d[l]ii. For each of the n frequent words, put a value in

d[l] indicating presence or absence

5. Divide the dictionary into a training set and a testing set

Diving In… The Algorithm (Unigram Occurrences)

6. Train a Naïve Bayes Classifier using the training set

7. Test the classifier using the testing set and report the accuracy

Next Steps

• Investigating the Maximum Entropy Classifier• Refining feature choice– Negation Tagging– Synonyms

• Investigating Regression techniques

Additional Applications of Sentiment Analysis

• Filtering of SPAM or abusive e-mails• Gauging the mood of people in a particular

network• Government intelligence• Psychological evaluation• Recommendation Systems• Display of ads on webpages

“Sentiment is the poetry of the imagination.”- Alphonse de Lamartine

sentiment analysis

Documents