integrated real-time social media sentiment analysis...

47
Integrated real-time social media sentiment analysis service using a big data analytic ecosystem By Danielle C. Aring Under the Direction of Dr. Sun Sunnie Chung Department of Electrical Engineering and Computer Science Cleveland State University

Upload: others

Post on 28-May-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Integrated real-time social media

sentiment analysis service using a big

data analytic ecosystem

By Danielle C. Aring

Under the Direction of

Dr. Sun Sunnie Chung

Department of Electrical Engineering and Computer Science

Cleveland State University

Page 2: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Big Data Analytics Methodology: Sentiment Analysis

▶ Big Data analytics are defined as the processes for researching massive data sets

to discover

▶ Hidden Information

▶ Hidden correlations

▶ A methodology is Sentiment analysis (opinion mining) of user-generated

text/messages => (This is very difficult!!)

▶ They contain

▶ Human expressions in natural languages

▶ Unstructured

Page 3: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Sentiment Analysis (Opinion Mining):Data Sources

▶ Opinion Mining applied per Phrase, Sentence, Paragraph in:

▶ Movie Reviews: IMDB Movie Review Site

▶ Product Reviews: Amazon Product Review Sites

▶ Social media dialog: Public posts from Social Network sites:

Twitter

Facebook

Page 4: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Overview of Sentiment Analysis:Approaches▶ Uses

Natural Language Processing Techniques

Text Analysis Techniques

To Extract, Quantify Subjective Information in a text span

▶ Two main types of opinions:▶ Regular: Sentiment expressed specific target entity

Ex: "The touch screen is really cool."▶ Comparative Sentiment expressed more than 1 entity

Ex: “iPhone is better than Blackberry.”

Page 5: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Sentiment Analysis Approaches:Defining the Structure of an Opinion

�An opinion is a quadruple

●Opinion mining is difficult!

●How to Estimate Sentiment: A correct Model must be chosen

Page 6: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Natural Language Processing (NLP) Techniques

Adopt Natural language Processing (NLP) Techniques

▶ Stemming

▶ Lemmatisation

▶ POS Tagging

▶ N-gram analysis

▶ Stop words removal

▶ Chunking

Page 7: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Problems in Natural Language Processing

○ Ignores the syntax and semantics of words

■ n-grams, Phrases

■ Synonymy, Polysemy, Grammar, Context

○ Loses Word Order in a Sentence

Page 8: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

New Approach in NLP: Word2vec - Continuous Skip-Gram Model

○ The word2vec Skip-Gram model is a Neural Network architecture that

learns semantically meaningful vector representations for words.

○ Given a target word wi, the skip-gram model is trained to predict the

surrounding context words wi-2, wi-1, wi+1, wi+2 in a phrase window size n =

5 for example.

○ After training, the weights in the embedding layer capture the semantics

through backpropagation as an indirect result of the prediction task, and

become the word vectors.

■ Words with similar meanings are mapped to similar positions in the

vector space with high dimensions (for example 300 dimensions).

Page 9: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Word2vec: Continuous Skip-Gram Model

Page 10: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Sentiment Analysis Types

▶ Classifying word polarity (e.g. positive negative neutral) in a document

▶ Beyond Polarity

▶ Advanced Sentiment (Pang 2004)

▶ Lexical Approach (Taboda 2011)

▶ Minimum cut extraction (Pang and Lee 2004)

▶ Topic-based (O'Connor et al. 2010)

▶ Aspect (feature) based opinion mining using semi-supervised

approach (Mukherjee and Liu 2016)

▶ Wordnet probability model (Tomas et al. 2013)

▶ Paragraph Embedding with Vectors (Dai et al. 2015)

Page 11: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Research Goals

▶ Investigate whether a stream-processing big data social media sentiment

service with analytics can offer the following compared to batch mode

counterparts:

▶ Scalability (enormous volume)

▶ Efficient near Real-time data processing and

▶ Data Analytics (Sentiment Analysis) with Accuracy

Page 12: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

System Architecture: Social Media Data Stream Sentiment Analysis Service (SMDSSAS)

Page 13: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

SMDSSAS System Architecture

Page 14: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Layer 1: Data Extraction

▶ Created a Spark Configuration

▶ Created a Spark context

▶ Created a Spark Streaming Context

▶ Spark DStream to filter for our messages

Page 15: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Layer 2: Data Stream Layer

Page 16: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Apache Spark

▶ Developed in 2010 from a Berkeley Research project

▶ For distributed big data processing built on top of Hadoop

▶ Why Spark? Overcomes the multi-stage application limitations of Map-reduce

▶ Uses in memory abstraction Resilient Distributed Datasets (RDDs)

▶ RDDs partitioned across clusters operated on in parallel

▶ RDDs are persisted in memory

▶ Reused in other operations across multiple map-reduce stages

Page 17: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Spark Streaming

▶ Internally Spark streaming receives:

▶ Live input data streams

▶ divides streams into batches

▶ further processed using the Spark engine to generate a final stream of results in batches:

▶ Each batch is resilient distributed dataset (RDD), processed in batches using RDD operations (map, reduce)

Page 18: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Spark DStreams

▶ Processed results pushed out in batches of discretized DStreams

▶ DStream: Continuous abstracted stream of data (RDDs)

▶ Map-reduce performed on each batch

▶ Operation performed on DStream carried out on subsequent RDDs

Page 19: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Layer 3: Data pre-processing and Transformation

Page 20: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Layer 3: Data Preprocessing/ TransformationApplying the NLP techniques:▶ Phase 1: During Spark Streaming (Data Storage Layer)▶ Preprocess messages in Tweet Stream to remove

characters sensitive to Hive Scanner▶ Ex: "\t" , "\", "\n", and "[\\p{C}" (control characters)▶ Phase 2: During Real-Time Streaming For Sentiment Analysis▶ Function pre-process messages in Tweet stream to

remove:▶ Twitter ‘@’ , "#", image and website URLs▶ Following punctuation: [. ,! “ ‘]▶ Numbers 0-9,▶ Following non-alphanumeric characters: $%&^*() + ~

Page 21: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Layer 4: Feature Extraction

Page 22: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Layer 5: Prediction

Page 23: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Prediction Layer:Sentiment Analysis Methodology

Page 24: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Motivation of Experiment 1

▶ Develop base sentiment model successful in event prediction

▶ Using 2016 Pre-Presidential Election on Nov. 5th 2016 data

▶ Quantify level of positive/negative sentiment

▶ Apply refinements to Correlate user sentiment with topic

▶ Why?

▶ System perform accurate event predictions.

Page 25: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Input Data Source

▶ Collected 3 datasets from Spark Streaming API using filtered DStream for Hillary

Clinton and Donald Trump and their political policies over 3 months.

▶ Preprocessed/stored in NoSQL Hive System:

▶ Pre election October 23rd 2016

▶ Pre-election November 5th 2016

▶ Post-Election Pre-Inauguration January 1st 2017

Page 26: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Naive Sentiment Model

Page 27: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Our Approaches:Quantifying Polarity of Sentiment in user generated Tweets

Page 28: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Quantifying Polarity of Sentiment:Polarity Score Function▶ Donald Trump: where document di is the entire twitter dataset and m is the individual tweet in di

▶ Hillary Clinton:

Page 29: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Results

Page 30: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

System Platform

▶ Components of our system:

1. Oracle VM Virtual Box version 5.1.20

2. CDH VM version 5.8

3. Scala IDE version 4.4.1

4. Apache Spark version 2.0.1

Page 31: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics
Page 32: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Shifting Sentiment Over the 2016 Presidential Election Cycle

Hillary Trump

Page 33: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Sentiment Model Refinements

Page 34: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Deterministic Topic Sentiment Model

▶ Given the presumption that topic and sentiment can be jointly inferred:

▶ Counted Instances of positive and negative sentiment in the context of user

provided topic word(s)

▶ Likelihood estimated as relative frequencies

▶ Tweets categorized by subjectivity and polarity (OpinionFinder Lexicon)

Page 35: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Deterministic Approach

Strong or

WeakSubjectivity

PositiveNegative

Or NeutralPolarity?

topicSentiment Scoringfunction

SubjectivityLexicon

TopicWords

List

If Twitter Message Contains Topic

Page 36: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Probabilistic Model

▶ Per-word log-based scoring function

▶ Beyond frequency based measure

▶ Used a modified log-likelihood

▶ Models probability of positive and negative tweets per user provided topic

word

Page 37: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Probabilistic Model

Page 38: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Results

Page 39: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics
Page 40: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Deterministic Model, Positive Polarity Measure of Sentiment, Donald Trump vs. Hillary Clinton

Deterministic Model Base Model

Trump 0.60 0.26Hillary 0.06 0.016

Page 41: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics
Page 42: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics
Page 43: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Contribution

• Development of a real time streaming Framework for multiphase sentiment

analysis: Social Media Data Stream Sentiment Analysis Service (SMDSSAS).

• Development of three Sentiment Models

1. Polarity Score Function

2. Deterministic Topic Model – Instances of positive and negative sentiment in

context of user provided topic word(s).

3. Probabilistic Model – Identify instance of positive and negative sentiment by

log of the ratio of sentiment count per topic correlated tweet.

Page 44: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

Conclusions▶ Successful Event Prediction on pre-Election data stream.

Candidate Donald Trump predicted winner given 0.6 Positive Polarity vs 0.06 of Clinton

▶ Improvements in Accuracy compared to the Existing Literature

Topic Sentiment Analysis Accuracy: 70-79%

Real Time Sentiment Analysis on Previous Presidential Election 59%

Deterministic Model Accuracy: 81%

Probabilistic Model Accuracy: 74%

▶ Our Sentiment Model Design is the first seen in the existing literature work

▶ Combination of

▶ Topic Sentiment Analysis Models and

▶ Real-Time Streaming Sentiment Analysis

▶ System performed Scalable Sentiment Analysis

Page 45: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

References▶ Apache Flume. 2017 Accessed at: https://flume.apache.org/▶ Dai Andrew, Olah Christopher, Le Quoc , "Document Embedding with Paragraph Vectors", (Google) arXiv:1507.07998v1, 2015▶ Apache Spark 2.1.0. 2017. "Evaluation Metrics - RDD-based API", 2017 Accessed at: https://spark.apache.org/docs/latest/ml-features.html▶ Apache Spark 2.1.0. 2017"Evaluation Metrics - RDD-based API", 2017 Accessed at: https://spark.apache.org/docs/2.1.0/mllib-linear-methods.html#linear-support-vector-machines-svms▶ Apache Spark 2.1.0. 2017. "Evaluation Metrics - RDD-based API", 2017 Accessed at: https://spark.apache.org/docs/2.1.0/ml-clustering.html▶ Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., and Zaharia, M. "Spark SQL: Relational data processing in Spark". In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD ’15), 2015.▶ Bifet, A., Maniu, S., Qian, J., Tian, G., He, C., and Fan, W. “Streamdm: Advanced data mining in spark streaming,” in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1608–1611. 2015.▶ Blanas, S., Patel, J., Ercgovac, V., Rao, J., "A comparison of join algorithms for log processing in MaPreduce". In Proceedings of the 2010 ACM SIGMOD, pages 975–986, 2010.▶ Blei, D., Ng, A., and Jordan, N. "Latent Dirichlet allocation". Journal of Machine Learning Research, 3:993–1022, 2003.▶ Borthakur, D., "Petabyte Scale Data at Facebook". Accessed 04/24/2017. http://www.infoq. com/presentations/Data-Facebook, 2012.▶ Cloudera. Accessed at: https://www.cloudera.com/products/enterprise-data-hub.html?src=GoogleAdWords&gclid=Cj0KEQjwioHIBRCes6nP56Ti1IsBEiQAxxb5G-by2R6GGduAVi-dVs087kNR89c-4AyUnj-cNf9OMrEaAvAX8P8HAQ▶ Dean, J., and Ghemawat, S. Mapreduce: simplified data processing on large clusters. In OSDI’04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (Berkeley, CA, USA, 2004), USENIX Association, pp. 10–10.▶ Deng, L., Gao, J., and Vuppalapati, C. “Building a Big Data Analytics Service Framework for Mobile Advertising and Marketing”, in Proc. IEEE 1st Int. Conf. Big Data Comput. Service Appl. (BigDataService), pp. 256-266. 2015.▶ "CDH Overview". Cloudera. Cloudera Inc, 2017. Web. 14 Apr. 2017.▶ Ewe, Lars. "What's the Best Way to Manage Big Data for Healthcare: Batch vs. Stream Processing?". Ask Eva Blog. Evariant Inc, 10 Dec. 2015. Web 11 Apr. 2017.▶ "Introduction to Big Data With Apache Spark ". KDnuggets Home. KDnuggets, 2017. Web. 11 Apr. 2017.▶ Cheng, K.M., Otto, Lau, Raymond. “Big Data Stream Analytics for Near Real-Time Sentiment Analysis”. Journal of Computer and Communications. 3:189-195.2015.

Page 46: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

References▶ Gundecha, P. Ranganath, S., Feng, Z., and Liu, H. “A Tool for Collecting Provenance Data in Social Media”, In Proceedings of the 19th ACM SIGKDD Demonstration, 30, 61. 2013.▶ Hsu, C.-W., Chang, C. , C.-C. , Lin, J. "A Practical Guide to Support Vector Classification". Tech. Rep. Taipei. 2003.▶ Hu, D. "Latent dirichlet allocation for text, images, and music". University of California, San Diego. 2009.▶ Hu, M., and Liu, B. 2004. Mining and summarizing customer reviews. KDD’04, 2004.▶ N. Jindal and B. Liu, “Mining comparative sentences and relations,” Proceedings of AAAI, 2006.▶ Katal A, Wazid M, Goudar RH. "Big data: issues, challenges, tools and good practices". In: Sixth international conference on contemporary computing (IC3), pp. 404–409. doi:10.1109/IC3. 2013.▶ Kulkarni, S., Bhagat, N, Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K, and Taneja, S., "Twitter heron: Stream processing at scale". In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 239–250. ACM, 2015.▶ Lekha R.Nair, DR. Sujala,D.Shetty, streaming Twitter Data Analysis Using Spark For Effective Job Search, Journal of Theoretical and Applied Information Technology ,. Vol.80. No. 2 2005 – 2015.▶ Liu.,B. "Sentiment Analysis and Subjectivity." Invited Chapter for the Handbook of Natural Language Processing, Second Edition. 2010.▶ MapR. 2014. Accessed at: https://mapr.com/ ▶ Mishne, Gilad and Maarten de Rijke. "A study of blog search". In Proceedings of ECIR. 2006.▶ Liu, B, M. Hu, and J. Cheng. "Opinion Observer: Analyzing and comparing opinions on the web". Proceedings of International Conference on World Wide Web (WWW). 2005.▶ Mukherjee, Arjun and Bing Liu. "Aspect Extraction through SemiSupervised Modeling". In Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (ACL-2012). 2012.▶ O'Connor, Brendan, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. "From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series". In Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM 2010). 2010.▶ "Overview of Cloudera and the Cloudera Documentation Set ". Cloudera. Cloudera Inc, 2017. Web. 14 Apr. 2017.▶ Pang, B., & Lee, L. "A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts". In Proceedings of the Association for Computational Linguistics (pp. 271–278). 2004.▶ Pang, Bo and Lillian Lee. "Opinion Mining and Sentiment Analysis". Foundations and Trends in Information Retrieval series. Now publishers. 2008.▶ Rennie J., Shih, L., Teevan, J., & Karger, D. "Tackling the Poor Assumptions of Naive Bayes Text Classifiers". Proc. of ICLM. 2003.

Page 47: Integrated real-time social media sentiment analysis ...eecs.csuohio.edu/~sschung/CIS660/Big_data_talk_Revised_v6.pdfBig Data Analytics Methodology: Sentiment Analysis BigData analytics

References▶ Sagiroglu, S. and Sinanc, D. “Big data: A review,” In Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp. 42–47. 2013.▶ Silva, Y.N. "NoSQL". Arizona State University. Accessed at: http://cis.csuohio.edu/~sschung/cis612/LectureNotes_NoSQL_1.pdf▶ Srivastava, D., Bhambhu L. "Data Classification Using Support Vector Machine". Journal Theoretical and Applied Information Technology, from www.jait.org. 2005-2009.▶ Stanford. "Scoring, term weighting and the vector space model". Cambridge University Press, p. 109-133. 2009. Accessed at: https://nlp.stanford.edu/IR-book/pdf/06vect.pdf▶ Taboada, Maite, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. "Lexicon-based methods for sentiment analysis". Computational Linguistics, 37(2): p. 267-307. 2011.▶ Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R. "Hive — a warehousing solution over a Map-Reduce framework". In VLDB (2009).▶ A. Thusoo, D. Borthakur, R. Murthy, Z. Shao, N. Jain, H. Liu, S. Anthony, and J. S. Sarma. "Data warehousing and analytics infrastructure at Facebook". In SIGMOD, 2010.▶ Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., and Murthy, R. “Hive — A petabyte scale data warehouse using Hadoop”. In Proceedings of the International Conference on Data Engineering. 996–1005. 2010.▶ Tilve, A., Jain, S. “A Survey on Machine Learning Techniques For Text Classification”. International Journal of Engineering Sciences and Research. 6(2).p513-520. 2017.▶ Yi, J., Nasukawa, T., Bunescu, R. and Niblack, W.“Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques”. In Proceedings of IEEE International Conference on Data Mining (ICDM). 2003.▶ Wang, H., Can, D., Kazemzadeh, A., Bar, F., and Narayanan, S. “A System for Real- Time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle”. In ACL System Demonstrations,pages 115–120. 2012.▶ Wang S., Zhiyuan, C., Liu, B. “Mining Aspect-Specific Opinion using a Holistic Lifelong Topic Model”. In: Proceedings of the 25th International Conference on World Wide Web. ACM, pp 167-176. 2016.▶ Zhai, Zhongwu, Bing Liu, Hua Xu, and Peifa Jia. “Clustering Product Features for Opinion Mining”. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM-2011). 2011