measuring opinion credibility in twitter

S

Measuring Opinion Credibility in Twitter

Mya Thandar and Sasiporn Usanavasin School of Information and Communication Technology (ICT), Sirindhorn International Institute of Technology (SIIT),

Thammasat University, Thailand

Content

S  Introduction

S  Objective

S  Research Problem

S  Proposed Method

S  Experiment

S  Conclusion

2

Introduction

S  In Hong Kong Protesting, many people such as Hong Kong citizen, government staff, journalists and news channels express information and their opinions over social media.

S  In that event, we don’t know whose opinions are strong or credential.

S  take the information from opinion content, the identifier of an author may help to determine credibility.

S  For that reason, we propose a new method to define the credibility of sentiment polarity based on their expertise or background knowledge and apply on Twitter: social media

3

Objective

S  To calculate the credibility of sentiment polarity based on author’s background knowledge in Twitter.

4

Research Problem

5

S  In this case, these two tweets shows their opinion for HongKong Revolution.

S  Whose opinion is reliable or not?

Research Problem (Cont..)

S  According to this point: S  we added the accuracy weight of the author’s background knowledge for a

given topic based on the following information:

▪  Who is mention about this tweet?

▪  How much does this user know about this topic?

▪  What is their background knowledge?

6

Proposed Method

S  assume when we classify user tweet’s polarity we didn’t think this is not the fully percent of positive or negative.

S  added the accuracy weight of the tweet polarity. 1)  Find the author’s sentiment polarity for a given topics

2)  Calculate the accuracy of author’s Sentiment Polarity for author tweet in a given

specific topic

7

Proposed Method

S  consists of two main components: S  sentiment analyzer S  opinion credibility calculator

8

Overview system of credibility of tweet polarity

9

Sentiment Analyzer

S  is to achieve classification of tweet’s polarity result.

S  used Linear SVM to classify sentiment polarity of our tweet dataset.

S  SVM is a supervised learning method used for classification.

S  w represent to find maximum hyperplane and separates document from one class or other

S  ci corresponds positive and negative (1,-1) with class of document di .

S  αi is to solve dual optimization problem.

10

Overview system of credibility of tweet polarity

11

Opinion Credibility Calculator

S  calculate credibility of opinion based on an author’s expert knowledge for a given topic.

S  In twitter, we consider user bio, user list and user tweets to identify the accuracy weight of user’s background knowledge. S  User bio: contains important information that indicates the expertise of the user, such as his/herself

summarized interests, career information and links to his/her personal web page.

S  User Lists: allow users to organize people they are following into labeled groups. It contains self-reported expertise indicator. i.e, follower’s judgment about one’s expertise and provide straightforward cues about this judgment to other users.

S  User Tweets: They tweeted almost everything: such as: •  their daily activities; •  comment on news; •  promotion of their company, etc.,

12

Expert Score

S  we compute expert score for given topic by adding weight of author’s bio, list features and author’s tweets behavior.

wBL = weight of author’s background knowledge (using Bio & List)

wTR = author’s topic related ratio (using Author’s Tweets)

wRT = ratio of author’s tweet retweet by other users for a given topic (using Author’s Tweets)

wF1 = ratio of author’s friends who are related with given topic (using Author’s Tweets)

wF2 = ratio of author’s follower who are related with given topic (using Author’s Tweets)

wOP= author’s opinions ratio for given topic(using Author’s Tweets)

13

wBL (using bio & list)

S  apply N-gram approach (using unigram and bigram) to segment them

S  extract Noun and Adjective using NLP toolkit1.

S  used ontology concept to filter conceptual keyword for a given topic

S  calculate the ratio of number of related keywords to total number of raw keywords.

S  This strategy produces wBL(using bio and list) to define the author’s expert knowledge.

14

Author’s tweets

S  the next weight of author based on an author tweets behavior and his/her network activities.

S  focus on the following features that reflected the impact of user background knowledge (expert knowledge score) for a given topic. •  author topic related ratio (wTR) •  ratio of author’s tweet retweet by other users (wRT) for a given topic •  ratio of author’s friends and followers who are related with given topic

(wF1, wF2) •  author’s opinions ratio for given topic (wOP)

15

wTR (Topic related ration)

S  For author topic related ratio, we make the ratio of number of author’s tweet related for specific topic to the number of his all tweet.

16

wRT (Author’s tweet retweet by other users)

S  In twitter, retweet is a re-posting someone else’s Tweet.

S  Using retweet features, we compute how much times author’s tweet has been retweeted by others for a particular topic.

17

wF1 and wF2 (Author’s friends and followers who are related with given topic)

S  wF1 and wF2 indicate the ratio of author’s friends and followers who are related with given topic.

18

wOP (Author’s opinions ratio)

S  For author’s opinions ratio; S  we assume how much times author expresses this opinions based

on his/her past all opinions for given topic. S  e.g., given tweet opinion is negative, we calculate number of

author’s negative tweet based on his/his all past tweet opinion.

19

Opinion credibility

S  combine expert scores with the result of polarity from sentiment analyzer

S  calculate the credibility of tweet polarity result (Cop).

20

Preliminary Experiment

S  use Twitter API to crawl HK revolution data (Sep 30 2014- Nov 30 2014) as training and testing dataset.

S  use 1000 sample data for training dataset to learn sentiment classification using RapidMiner Tool.

S  labeled this dataset with tweet’s polarity(positive and negative).

S  use many set of SVM parameters for finding the best result and apply 10 fold cross validation.

21

Experiment (Sentiment Classification)

22

Experiment (Credibility Calculation)

23


24

the range for credibility value: {highest>=70%, lowest<=30%, middle}


25

Conclusion

S  proposed a system to measure credibility of sentiment polarity based on author’s expert knowledge.

S  To find credibility opinion, our system performed 2 steps: S  Sentiment classification S  Opinion Credibility calculation

S  used dataset Hong Kong Revolution dataset.

S  will evaluate the accuracy of opinion credibility result.

26

Acknowledgements

S  Center of Excellence in Intelligent Informatics, Speech and Language Technology and Service Innovation (CILS), Thammasat University,

S  Intelligent Informatics, and Service Innovation (IISI), SIIT, Thammasat University

S  NRU grant at SIIT, Thammasat University

27

S

Thank You! J Any Questions???

28

measuring opinion credibility in twitter

Social Media