Download - Sais svcc
Simple Sentiment Analysis Using Solr
Silicon Valley Code CampFoothill College – Oct 6,2013
By: Pradeep Pujari
Sentiment Analysis?
Sentiment Analysis – General Architecture
Little Lucene
Sentiment Analysis and Solr
Applications of Sentiment Analysis
Code Walkthrough
Objectives
Working mostly in Search domain
Search = IR + ML + NLP
Who am I?Works for
Contributing to SolrSherlock
- Open Source Project
Who am I?
http://solrsherlock.github.io/SolrSherlock/
What is Sentiment Analysis?A linguistic analysis technique that identifies
The movie is great.
The movie stars Mr. X
The movie is horrible.
opinion early in a piece of text.
Challenging
Too easy Too hard
Difficultymis
cla
ssifi
cati
on
What is Sentiment Analysis?
Sentiment Analysis
NLP
Cognitive Science
What is Sentiment Analysis?
Human can easily understand emotions.
Can a machine be trained to do it?
What is Sentiment Analysis?
SA offers organizations ability to monitor in real time and act accordingly
Marketing managers, PR Firms, campaign managers, politicians, equity investors, on line shoppers are direct beneficiaries
http://www.tweetfeel.com
http://www.nytimes.com/interactive/us/politics/2010-twitter-candidates.html
Key Insights
Generic Sentiment Analysis System
Document-Levelsupervised/non supervised learning
Sentence-Levelsupervised learning
Feature-Based Sentiment AnalysisAll NP in corpus and Polarity
Sentiment Lexicon Acquisition WordNet
Complexity
Open-source Java based search engine
Provides document indexing w/ arbitrary fields and fast search
Several relevance and ranking algorithms
Apache Lucene
1. Create an index
2. Add ‘document’ representations of items
3. Construct queries
4. Ask for results (will be scored )
Using Lucene
IndexWriterConfig config = /* configure */ ;Directory dir = FSDirectory.open(indexFile);IndexWriter w = new IndexWriter(dir, config);for (ItemInfo item: getItems()) { Document doc = new Document(); doc.add(new Field("title", item.title)); doc.add(new Field("tags", item.tags)); w.add(doc);}w.close();
Building the Index
IndexSearcher idx = getIndexSearcher(); IndexReader reader = idx.getIndexReader(); TopDocs results = idx.search(q, n + 1);
Finding Items
PyLucene is Python implementation
Lucy is in C w/ bindings for other langs
Lucene.NET
SOLR provides search server (with REST
API) on top of Lucene
Related Projects
Solr ?Http Request Servlet
AdminInterface
Update Servlet
Standard Request Handler
Custom Request Handler
ResponseWriter
Solr Core
Lucene
Analysis UIMA
configCachin
g
UpdateHandler
Linguistics moduleStems, Lemmas and Synonyms
multi language capabilityCJKAnalyzer, UIMA Analyzers
UIMA integrationUpdateProcessorChain
Why Solr ?
Why Solr ?Extract domain specific entities and concepts
Time and Cost
Solr Set Up – 5 mins
UIMA Annotators - 5 days
Enrich text, write to dedicated field
Tagging entities in review text
Applications:
I wasn't really in the market for another tablet, but my girlfriend ended up getting one for me so she got me on this one. I would like to say that this tablet reminds me of the first Motorola Droid smartphone that came out several years back. The phone jam packed a ton of bells & whistles into its hardware and software to give a lot of bang for your buck. This is what it feels like amazon has done with the Kindle Fire 8.9. They have put a lot of advanced hardware and innovative software, so for the average user, specially someone who absorbs a lot of media, you get a lot for the price. But just because you get a lot for the price, doesn't mean it is without its flaws.
Applications:Consumer feedback about products
Which product features are more relevant
Polarity
Digital SLR with Full 1080p HD Video
There are many preprogrammed scene modes that make this a very easy camera to use.The picture quality is beyond belief, and even better for the price.
Price:
Usecase
Why UIMA ?UIMA Framework manages componentsand data flow – No coding
Deploy pipeline of analysis engines
AEs wrap NLP algorithms
PersonPlace
organization
Language
Detection
Aggregate analysis engine
SentenceAnnotator
POSAnnotator NER
Index
Lucene
Solr UpdateRequestProcessor
Solr
QParser Data
Solr+UIMA
UIMA AE
NLP+UIMA Use POS in query understanding
boosting termsSynonym expansion
Extract concepts/entities
Faceting using entities
Identify places in query and use spatial queries
Ideas: Sentiment Analysis App
Identify Subjective Sentences from text
Remove noisy sentences– Regex, conditional probability
Graph min cut – LingPipe
Subjectivity Lexicons
Discard Facts and Objective Sentences
Subjectivity
detector
Subjective
Objective
Polarity Classifier
Ideas: Sentiment Analysis App
Sentiments Intensity - SentiWordNetWordNet-Affect: WordNet +
annotated concepts
Ideas: Sentiment Analysis App
Hybrid model with adding dictionary
Update Handler with
processor chain
Remove Duplicatesprocessor
Loggingprocessor
Custom Transformprocessor
Indexprocessor
Update Processor Chain
Text Analyzers
Lucene
Lucene Index
Sentence Detectionprocessor
Sentiment Classifier
Company NameAnnotator
Sentiment Scoreprocessor
Product Reviews
Let’s look at the code
Data transformation or post processing UpdateProcessorFactory
LogUpdateProcessorFactory UIMAUpdateProcessorFactory
UpdateRequestProcessorChain◦ Pipe line of UpdateRequestProcessors
UpdateRequestProcessor
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
How to configure
Stanford NER
Additional Libraries
<updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> </lst> <lst name="analysisEngine"><str name="defaultanalysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str> </lst> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"> <str>content_text</str> </arr> </lst> <lst name="fieldMappings"> <lst name="type">
<str name="name">org.apache.uima.DictionaryEntry</str> <lst name="mapping"> <str name="feature">coveredText</str> <str
name="field">sentiment_keyword,sentiment_type</str> </lst>
</lst>
Referenceshttp://lucene.apache.org/solr/
http://uima.apache.org/
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
http://openie.cs.washington.edu/
http://wiki.apache.org/solr/SolrUIMA
Questions ?
Thank You
Email: [email protected]