data mining in social media

20
Data Mining in Social Media By: Anthony Smith & Joey Fazzani Welcome Objectives Data Mining Social Media Scale Summary Mining Social Media Data Collection Apriori Algorithm K-Means Algorithm Uses and benefits

Upload: ivy

Post on 09-Feb-2016

58 views

Category:

Documents


1 download

DESCRIPTION

Welcome. Objectives. Data Mining. Data Mining in Social Media. Social Media. Scale. Mining Social Media. Data Collection. Apriori Algorithm. K-Means Algorithm. Uses and benefits. Summary. By: Anthony Smith & Joey Fazzani. Objectives. Welcome. Objectives. Data Mining. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Data Mining in Social MediaBy:Anthony Smith & Joey FazzaniWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Welcome to our presentation on data mining in Social Media1ObjectivesThings to learn;

Sentiment analysisUse of algorithmsUses and benefits of mining social media

WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Here are just a few things that we hope you learn within the next 5 mins.

Sentiment analysisThe use of AlgorithmsThe Uses and Benefits of mining social media. 2http://findicons.com/search/data-mining

Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both.What is Data Mining?http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htmWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Here is a definition of data mining.

(Pause for 5 seconds to allow users to read)

To summarise: Data Mining is used to process and analyse data to then make meaningful use of it. 3

http://www.417marketing.com/wp-content/uploads/2013/08/Social-Media.jpgSocial mediais digital content and interaction that is created by and between people.http://heidicohen.com/social-media-definition/WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

What is Social Media?Social Media is digital content and interaction that is created by and between people. 4

ScaleInteraction:Users UsersUsers Contenthttp://www.reachsolutions.co.nz/services/social-media-marketingWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Social media ranges from Youtube to facebook to virtual worlds and gaming such as World of War Craft

As well as showing how users interact with one another it also shows how users interact with the content of the web.

Social Media has billions of users on Hundreds of site, giving huge traces of human activities.

By Using data mining and machine learning to reduce the noise we can make data meaningful.

5Mining Social Media

What data to extract? Enough to be meaningfulNeeds to be scopedWithin system capabilityhttp://www.dundas.com/blog-post/the-perils-of-big-data/http://www.hardcorehockey.co.uk/article/on-the-pitch/warm-up/warm-up-arm-swings

Continuously ChangingAs data changes, ACT

WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Within mining social media we must consider how big and dynamic the data is.The size of data must be large enough to be meaningful, scoped to meet requirements and within the systems capabilities.

Data is constantly changing and being updated so this must also be considered.

6Data CollectionTwitter

Unstructured Data

Tweets: long string of textAuthortext of tweetHashtagEtc.

Break down into columns.

WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

As a practical example we will use Twitter.

As with many items on the web the data is unstructured

We need to extract the tweets as a long string of text. This will include information such as the author, Text of the tweet and the hashtag. The string must be broken down into these sections.

7Data Collection(2)Enter Key words to search forEnter Duration

IPhone, IOSAndroidBlackberry

WelcomeObjectivesData MiningSocial MediaScaleGrowthRepresentation Summary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsYou can scrape websites looking for key words as well as how long to complete the search for.

We can set key words for example IPhone, Anroid, Blackberry,8Data Collection(3)Sentiment analysis Look in text field for

GoodLove, great, etc.

BadHate, doubt, etc.

Love my new iPhone #happy

Hate my new iPhone #brokenWelcomeObjectivesData MiningSocial MediaScaleGrowthRepresentation Summary Mining Social MediaApriori AlgorithmK-Means AlgorithmUses and benefitsWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaApriori AlgorithmUses and benefits

WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsWe can complete a sentiment analysis which looks for predefined terms that are associated with good or bad sentiments.

I love my new Iphone would be associated with good because it contains a good sentiment in loveI hate my new Iphone would be associated with bad because it contains a bad sentiment in hate9Data Collection(4)Wealth of information

In 3 min = 398 tweets

398 * 20 = 7, 960 = 1 HOUR 7, 960 * 24 = 191, 040 DAY

Could be GB, TB or PB of information WelcomeObjectivesData MiningSocial MediaScaleGrowthRepresentation Mining Social MediaApriori AlgorithmK-Means AlgorithmUses and benefitshttp://www.youtube.com/watch?v=Jqq66INlQ0U

WelcomeObjectivesData MiningSocial MediaScaleMining Social MediaApriori AlgorithmUses and benefitsWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsIn 3 min 398 tweets were submitted containing keywords IphoneAndroidBlackberry

At the same rate of user input this is equivalent to 191, 040 tweets a day

10The Apriori AlgorithmUsed to find Association Rules

Especially significant in customer transactions WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsCustomerItems Bought 1A, D. F G 2B, C D, G3D, H4A, B, HD, F, G

The apriori algorithm is often used to find assiciation pattern in customer spending. For example if customer 1 buys Product A they are also likely to buy Product B11The Apriori AlgorithmWith the data from the databaseAssociation rulesIdentifies which OS systems are mentioned with positive or negative comments most frequently. WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Using the data collected previously we must apply a set of association rules that enable us to identify the information we require, in this case we wish to discover which Operating Systems are mentioned in Tweets and what comments or types of comments are said about them.12The Apriori Algorithm

http://en.wikipedia.org/wiki/Apriori_algorithmWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Here is the pseudo code for the algorithm, although this may look very confusing, everything will become a lot clearer in the next few slides. 13WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsThe Apriori AlgorithmDatabaseTweetKey Words in the tweet1IOS, Iphone, Apple2Android, IceCream, JellyBean3IOS, IceCream, Android, Iphone, JellyBean4Amdroid, JellyBeanC1KeywordFrequencyIOS2IceCream3Iphone3Apple1JellyBean3L1KeywordFrequencyIOS2IceCream3Iphone3JellyBean3Scan DatabaseDrop anything Under 0.5

Here we have a database with tweets and the keywords that were extracted.

We scan the database and output how many times each key word appeared.

We set a parameter so anything with a frequency under this is disregarded as been insignificant. 0.5 in this case.

C1 shows the frequencies.

L1 is showing the frequencies above 0.5. Apple has been dropped as it has a probability of 0.25. This is worked out by the frequency divided by number of tweets.14WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsThe Apriori AlgorithmL1KeywordFrequencyIOS2IceCream3Iphone3JellyBean3C2InstancesFrequency[IOS, IceCream]1[IOS, Iphone]2[IOS JellyBean]1[IceCream, Iphone]2[IceCream, JellyBean]3[Iphone, JellyBean]2C3InstancesFrequencyIOS, Iphone2IceCream, Iphone2IceCream, JellyBean3Iphone, JellyBean2

C2 shows all the possible values of L1 and a second Scan of the Database is completed to find how many times those instances appeard.. For example number of times Iphone and IOS appeared in the same tweet

We drop the highlight sections as they are less than 0.5. worked out by Number of times both keywords appear divided by number of tweets.

The result of the algorithm is: the combination of items that appear frequently within the database are highlight.

This can be extended to show do IPhone and bad? Or IPhone and Good? Appear frequently 15The Apriori AlgorithmIPhone is good

Android is badantecedentconsequentIPhone is goodXYRelationshipXYIntroductionWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

If people who mention x(iphone or android) also say y(good or bad) then we can state a relationship exists between them. An association rule is an implication of the form x--->y meaning X is the antecedent (a thing that existed before) and y is the consequent(the thing following) ((Iphone is good/Android is bad)).

16The K-means AlgorithmK-means consumer clustering we can decipherThe most popular OS within clusters What users are more likely to continue using given OS and which are likely to change. Collect data that can infer marketing decisionsWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

To further our findings we could incorporate clustering through K-means algorithm which would allow us to discover information on things such as brand alliance and which user groups are more likely to stay with their current systems etc.17The K-means Algorithmhttp://www.imore.com/sites/imore.com/files/styles/large/public/field/image/2013/09/pink_iphone5c.png?itok=WuLq66WY

WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsNergis Ylmaz and Glfem Iklar Alptekinhttp://www.iaeng.org/publication/WCE2013/WCE2013_pp1611-1616.pdfArticle

In this article they combine the K-means algorithm and apriori Algorithm. This helped them identify that female consumers generally use iPhones and specify an iPhone as their next choices. Therefore it would be a good idea to create social media campaigns for female consumers.

The introduction of new products such as a pink iPhone and other ideas are often brought to the market on the back of information gained through these pratices.

18Uses and benefits within Social MediaUseful for finding customer groups that share interests.

Marketing campaigns can be altered to target specific areas to generate revenueto meet consumer needs.WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

The above algorithms and other data mining and text analytics techniques can

And Without social media this information could not be obtained19What weve learnt todaySentiment analysisApriori algorithm Uses and benefits of mining social media

WelcomeObjectivesData MiningSocial MediaScaleSummaryMining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits

Thank you very much for listening, we hope you have learned a few things today. Bye now.20