sais svcc

38
Simple Sentiment Analysis Using Solr Silicon Valley Code Camp Foothill College – Oct 6,2013 By: Pradeep Pujari

Upload: pradeep-pujari

Post on 10-May-2015

1.044 views

Category:

Technology


5 download

DESCRIPTION

S

TRANSCRIPT

Page 1: Sais svcc

Simple Sentiment Analysis Using Solr

Silicon Valley Code CampFoothill College – Oct 6,2013

By: Pradeep Pujari

Page 2: Sais svcc

Sentiment Analysis?

Sentiment Analysis – General Architecture

Little Lucene

Sentiment Analysis and Solr

Applications of Sentiment Analysis

Code Walkthrough

Objectives

Page 3: Sais svcc

Working mostly in Search domain

Search = IR + ML + NLP

Who am I?Works for

Page 4: Sais svcc

Contributing to SolrSherlock

- Open Source Project

Who am I?

http://solrsherlock.github.io/SolrSherlock/

Page 5: Sais svcc

What is Sentiment Analysis?A linguistic analysis technique that identifies

The movie is great.

The movie stars Mr. X

The movie is horrible.

opinion early in a piece of text.

Page 6: Sais svcc

Challenging

Too easy Too hard

Difficultymis

cla

ssifi

cati

on

What is Sentiment Analysis?

Page 7: Sais svcc

Sentiment Analysis

NLP

Cognitive Science

What is Sentiment Analysis?

Page 8: Sais svcc

Human can easily understand emotions.

Can a machine be trained to do it?

What is Sentiment Analysis?

Page 9: Sais svcc

SA offers organizations ability to monitor in real time and act accordingly

Marketing managers, PR Firms, campaign managers, politicians, equity investors, on line shoppers are direct beneficiaries

http://www.tweetfeel.com

http://www.nytimes.com/interactive/us/politics/2010-twitter-candidates.html

Key Insights

Page 10: Sais svcc

Generic Sentiment Analysis System

Page 11: Sais svcc

Document-Levelsupervised/non supervised learning

Sentence-Levelsupervised learning

Feature-Based Sentiment AnalysisAll NP in corpus and Polarity

Sentiment Lexicon Acquisition WordNet

Complexity

Page 12: Sais svcc

Open-source Java based search engine

Provides document indexing w/ arbitrary fields and fast search

Several relevance and ranking algorithms

Apache Lucene

Page 13: Sais svcc

1. Create an index

2. Add ‘document’ representations of items

3. Construct queries

4. Ask for results (will be scored )

Using Lucene

Page 14: Sais svcc

IndexWriterConfig config = /* configure */ ;Directory dir = FSDirectory.open(indexFile);IndexWriter w = new IndexWriter(dir, config);for (ItemInfo item: getItems()) { Document doc = new Document(); doc.add(new Field("title", item.title)); doc.add(new Field("tags", item.tags)); w.add(doc);}w.close();

Building the Index

Page 15: Sais svcc

IndexSearcher idx = getIndexSearcher(); IndexReader reader = idx.getIndexReader(); TopDocs results = idx.search(q, n + 1);

Finding Items

Page 16: Sais svcc

PyLucene is Python implementation

Lucy is in C w/ bindings for other langs

Lucene.NET

SOLR provides search server (with REST

API) on top of Lucene

Related Projects

Page 17: Sais svcc

Solr ?Http Request Servlet

AdminInterface

Update Servlet

Standard Request Handler

Custom Request Handler

ResponseWriter

Solr Core

Lucene

Analysis UIMA

configCachin

g

UpdateHandler

Page 18: Sais svcc

Linguistics moduleStems, Lemmas and Synonyms

multi language capabilityCJKAnalyzer, UIMA Analyzers

UIMA integrationUpdateProcessorChain

Why Solr ?

Page 19: Sais svcc

Why Solr ?Extract domain specific entities and concepts

Time and Cost

Solr Set Up – 5 mins

UIMA Annotators - 5 days

Enrich text, write to dedicated field

Page 20: Sais svcc

Tagging entities in review text

Applications:

I wasn't really in the market for another tablet, but my girlfriend ended up getting one for me so she got me on this one. I would like to say that this tablet reminds me of the first Motorola Droid smartphone that came out several years back. The phone jam packed a ton of bells & whistles into its hardware and software to give a lot of bang for your buck. This is what it feels like amazon has done with the Kindle Fire 8.9. They have put a lot of advanced hardware and innovative software, so for the average user, specially someone who absorbs a lot of media, you get a lot for the price. But just because you get a lot for the price, doesn't mean it is without its flaws.

Page 21: Sais svcc

Applications:Consumer feedback about products

Which product features are more relevant

Polarity

Page 22: Sais svcc

Digital SLR with Full 1080p HD Video

There are many preprogrammed scene modes that make this a very easy camera to use.The picture quality is beyond belief, and even better for the price.

Price:

Usecase

Page 23: Sais svcc

Why UIMA ?UIMA Framework manages componentsand data flow – No coding

Deploy pipeline of analysis engines

AEs wrap NLP algorithms

PersonPlace

organization

Language

Detection

Aggregate analysis engine

SentenceAnnotator

POSAnnotator NER

Page 24: Sais svcc

Index

Lucene

Solr UpdateRequestProcessor

Solr

QParser Data

Solr+UIMA

UIMA AE

Page 25: Sais svcc

NLP+UIMA Use POS in query understanding

boosting termsSynonym expansion

Extract concepts/entities

Faceting using entities

Identify places in query and use spatial queries

Page 26: Sais svcc

Ideas: Sentiment Analysis App

Identify Subjective Sentences from text

Remove noisy sentences– Regex, conditional probability

Graph min cut – LingPipe

Subjectivity Lexicons

Discard Facts and Objective Sentences

Page 27: Sais svcc

Subjectivity

detector

Subjective

Objective

Polarity Classifier

Ideas: Sentiment Analysis App

Page 28: Sais svcc

Sentiments Intensity - SentiWordNetWordNet-Affect: WordNet +

annotated concepts

Ideas: Sentiment Analysis App

Hybrid model with adding dictionary

Page 29: Sais svcc

Update Handler with

processor chain

Remove Duplicatesprocessor

Loggingprocessor

Custom Transformprocessor

Indexprocessor

Update Processor Chain

Text Analyzers

Lucene

Lucene Index

Sentence Detectionprocessor

Sentiment Classifier

Company NameAnnotator

Sentiment Scoreprocessor

Product Reviews

Page 30: Sais svcc

Let’s look at the code

Page 31: Sais svcc

Data transformation or post processing UpdateProcessorFactory

LogUpdateProcessorFactory UIMAUpdateProcessorFactory

UpdateRequestProcessorChain◦ Pipe line of UpdateRequestProcessors

UpdateRequestProcessor

Page 32: Sais svcc

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >

<lst name="defaults">

<str name="update.processor">uima</str>

</lst>

</requestHandler>

How to configure

Page 33: Sais svcc

Stanford NER

Additional Libraries

Page 34: Sais svcc

<updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> </lst> <lst name="analysisEngine"><str name="defaultanalysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str> </lst> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"> <str>content_text</str> </arr> </lst> <lst name="fieldMappings"> <lst name="type">

<str name="name">org.apache.uima.DictionaryEntry</str> <lst name="mapping"> <str name="feature">coveredText</str> <str

name="field">sentiment_keyword,sentiment_type</str> </lst>

</lst>

Page 35: Sais svcc
Page 36: Sais svcc

Referenceshttp://lucene.apache.org/solr/

http://uima.apache.org/

http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html

http://openie.cs.washington.edu/

http://wiki.apache.org/solr/SolrUIMA

Page 37: Sais svcc

Questions ?

Page 38: Sais svcc

Thank You

Email: [email protected]