challenges of social media analysis in the real world

66
What you Tweet is What You Get: challenges of social media analysis in the real world Dr. Diana Maynard University of Sheffield, UK

Upload: diana-maynard

Post on 22-Jan-2018

863 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Challenges of social media analysis in the real world

What you Tweet is What You Get: challenges of social media analysis in

the real world

Dr. Diana MaynardUniversity of Sheffield, UK

Page 2: Challenges of social media analysis in the real world

Social Media and Big Data

● Big data is one of the biggest buzzwords of this decade

● If only we can harness all the information contained in social media, we have the Holy Grail to business intelligence

● There are already lots of examples of big data and social media analysis being used in real life...

Page 3: Challenges of social media analysis in the real world

Purchasing Recommendations

● Amazon recommends books (and other products) based on your previous purchases

Page 4: Challenges of social media analysis in the real world

Sensor-driven big data

● Yarra trams in Melbourne, Australia, is using big data, sensor-driven data and the Internet of Things to deliver reports to central operations, and real-time alerts to maintenance techs and passengers.

Page 5: Challenges of social media analysis in the real world

Mood of the Nation

Page 6: Challenges of social media analysis in the real world

Social Media and the Stock Market

• Analysing public mood has proved useful for things like stock market prediction

• Fluctuations are driven mainly by fear rather than by things like happiness or sadness

• Although this could go horribly wrong...

Page 7: Challenges of social media analysis in the real world

The Social Oscars 2013Brandwatch ran a project to investigate how closely public opinion predicted/mirrored the results of the Oscars

Page 8: Challenges of social media analysis in the real world

What actually is Big Data?

And what does it have to do with

social media analysis?

Page 9: Challenges of social media analysis in the real world

“Big Data is data that breaks Excel.”

Or so the joke goes. But it's come to mean a lot more than that....

“Big data is an umbrella term that means a lot of different things, but to me, it means the possibility of doing extraordinary things using modern machine learning techniques on digital data. Whether it is predicting illness, the weather, the spread of infectious diseases, or what you will buy next, it offers a world of possibilities for improving people’s lives.”

- Shashi Upadhyay, CEO and Founder, Lattice Engines

“Big data is just the ability to gather information and query it in such a way that we are able to learn things about the world that were previously inaccessible to us.”

- Hilary Mason, Founder of  Fast Forward Labs

Page 10: Challenges of social media analysis in the real world

“Big data, which started as a technological innovation in distributed computing, is now a cultural movement by which we continue to discover how humanity interacts with the world — and each other — at large-scale.”

- Drew Conway, Head of Data, Project Florida

“Big data refers to the approach to data of “collect now, sort out later”…meaning you capture and store data on a very large volume of actions and transactions of different types, on a continuous basis, in order to make sense of it later. The low cost of storage and better methods of analysis mean that you generally don’t need to have a specific purpose for the data in mind before you collect it.

- Rohan Deuskar, CEO and Co-Founder, Stylitics

“Endless possibilities or cradle-to-grave shackles, depending upon the political, ethical, and legal choices we make.”

- Deirdre Mulligan, Associate Professor, UC Berkeley School of Information

Page 11: Challenges of social media analysis in the real world

Big Data is not new!

Staff sorting 4M used tickets from #London Underground to analyse line use in 1939

Page 12: Challenges of social media analysis in the real world

The rise of Big Data

● “Big Data” was added to the OED in 2013.● According to the Gartner Hype Cycle, it has now

passed the “peak of inflated expectations” and is moving into the “trough of disillusionment”.

Page 13: Challenges of social media analysis in the real world
Page 14: Challenges of social media analysis in the real world

Big Data: the Holy Grail or the Bridge of Death?

Page 15: Challenges of social media analysis in the real world

How to Navigate the Bridge of Death

● A search for the Big Data Holy Grail can easily lead you to the “Bridge of Big Data Death”

● When the bridge keeper asks: “What business problem are you trying to solve with the help of Big Data?”

● Too many businesses will then have to answer: “I don’t know… AAAGH!”

● At which point they fall into the Gartner Big Data Trough of Disillusionment!

● Unless you're King Arthur, in which case you will first ensure you understand the essential details behind the question.

Page 16: Challenges of social media analysis in the real world

We need to be King Arthur!

Page 17: Challenges of social media analysis in the real world

Big Data and Social Media

● Social media is not just about teenagers twittering about their favourite pop stars, or what they had for breakfast

● Although even this information can be useful, if you manufacture breakfast cereal or manage boy bands

Page 18: Challenges of social media analysis in the real world

Social Media is a valuable business tool

● business insights● sharing and receiving important news● campaigns● all kinds of collective intelligence● an alternative to traditional polls● and much more

Page 19: Challenges of social media analysis in the real world

To get information in an emergency 20% have already used

social media 50% would potentially

use it

To post information in an emergency 75% would use Facebook 22% would use blogs 21% would use Twitter

Page 20: Challenges of social media analysis in the real world

NLP tools give us a way to understand the data

● sort the data to remove the drivel from the interesting parts

● extract the relevant pieces of information● link the extracted information to other sources of

information (e.g. from DBpedia)● aggregate the information according to potential

new categories● query the (aggregated) information● visualise the results of the query

Page 21: Challenges of social media analysis in the real world

GATE social media analysis toolkit

Page 22: Challenges of social media analysis in the real world

Where in the UK did MPs tweet more about the economy?

Page 23: Challenges of social media analysis in the real world

Public reaction to topics Miliband mentioned

Page 24: Challenges of social media analysis in the real world

Opinion Mining

• Along with NER, opinion mining is a key component in social web analysis

• NER: names of people, organisations, locations

• Opinion mining: what sentiments are being expressed?

Page 25: Challenges of social media analysis in the real world

TripAdvisor Hotel reviews

Page 26: Challenges of social media analysis in the real world

Rotten TomatoesFilm Reviews

Page 27: Challenges of social media analysis in the real world

And taking it a step further

What are the opinions on crucial social events and the key people involved?

How are these opinions distributed in relation to demographic user data?

How have these opinions evolved?

Who are the opinion leaders?

What is their impact and influence?

Page 28: Challenges of social media analysis in the real world

Challenges for social media analysis

Page 29: Challenges of social media analysis in the real world

Be careful with opinions!

Sentiment analysis isn't just about looking at the sentiment words

● “It's a great movie if you have the taste and sensibilities of a 5-year-old boy.”

● “I hate that John did so well in the debate last night.”● “I'd have liked the film a lot more if it had been a bit shorter.”

Situation is also everything. If you and I are best friends, then my graceful swearing at you is different from if it’s at my boss.

Page 30: Challenges of social media analysis in the real world

Old wine or warm beer?

● old wine● warm pizza● cold Coke

● old camera● warm beer● cold coffee

● long walk● short book● hot weather

Page 31: Challenges of social media analysis in the real world

Why are many opinion mining tools unsuccessful?

• They don't work well at more than a very basic level

• They mainly use dictionary lookup for positive and negative words

• They classify the tweets as positive or negative, but not with respect to the keyword you're searching for

• First, the keyword search just retrieves any tweet mentioning it, but not necessarily about it as a topic

• Second, there is no correlation between the keyword and the sentiment: the sentiment refers to the tweet as a whole

• Sometimes this is fine, but it can also go horribly wrong

Page 32: Challenges of social media analysis in the real world

Death confuses opinion mining tools

Opinion mining tools are good for a general overview, but not for some situations

Page 33: Challenges of social media analysis in the real world

Nobody liked Leonard Nimoy

Page 34: Challenges of social media analysis in the real world

Or did they?

Page 35: Challenges of social media analysis in the real world

Challenges imposed by social media

• Language: social media typically exhibits very different language style

– Solution: train specific language processing components

• Relevance: topics and comments can rapidly diverge.

– Solution: train a classifier or use clustering techniques

• Lack of context: hard to disambiguate entities

– Solution: data aggregation, metadata, entity linking techniques

Page 36: Challenges of social media analysis in the real world

“Incorrect” language makes analysis hard

Sumbuddy: Hey, hao es your familie?

Guy: They got crushed by a bus and died.

Sumbuddy: Daz so sad...wanna get iscreem?

OMMMFG!!! JUST HEARD EMINEM'S “RAPGOD”. SMFH!!! these other dudes might as well stop rapping if they not on this level

@adambation Try reading this article , it looks like it would be really helpful and not obvious at all #sarcasm http://t.co/mo3vODoX

● Solutions: ● specific pre-processing for Twitter● use shallow analysis techniques with back-off strategies● incorporate specific subcomponents for swear words, sarcasm etc.

Page 37: Challenges of social media analysis in the real world

Short sentences in tweets

• Social media, and especially tweets, can be problematic because sentences are very short and/or incomplete

• Typically, linguistic pre-processing tools such as tokenisers, POS taggers and parsers do badly on such texts

• Even language identification tools can have problems

• Need for special NLP pre-processing tools

Page 38: Challenges of social media analysis in the real world

Lack of context causes ambiguity

Branching out from Lincoln park after dark ... Hello Russian Navy, it's like the same thing but with glitter!

??

Page 39: Challenges of social media analysis in the real world

Getting the NEs right is crucial

Branching out from Lincoln park after dark ... Hello Russian Navy, it's like the same thing but with glitter!

Page 40: Challenges of social media analysis in the real world

Tokenisation issues

● Hashtags often contain smushed words

● #SteveJobs● #CombineAFoodAndABand● #southamerica

● For NER we want the individual tokens so we can link them to the right entity

● For opinion mining, individual words in the hashtags often indicate sentiment, sarcasm etc.

● #greatidea● #worstdayever

Page 41: Challenges of social media analysis in the real world

Irony and sarcasm

• I had never seen snow in Holland before but thanks to twitter and facebook I now know what it looks like. Thanks guys, awesome!

• Life's too short, so be sure to read as many articles about celebrity breakups as possible.

• I feel like there aren't enough singing competitions on TV . #sarcasmexplosion

● Want to solve the problem of #ClimateChange? Just #vote for a #politician! Poof! Problem gone! #sarcasm #TVP #99%

Page 42: Challenges of social media analysis in the real world

Sarcasm is a part of British culture

● So much so that the BBC has its own webpage on sarcasm designed to teach non-native English speakers how to be sarcastic successfully in conversation

Page 44: Challenges of social media analysis in the real world

How do you know when someone is being sarcastic?

• Use of hashtags in tweets such as #sarcasm, #irony, #whoknew etc.

• Large collections of tweets based on hashtags can be used to make a training set for machine learning

• But you still have to know what to do with sarcasm once you've found it

• Although sarcasm generally entails saying the opposite of what you mean, it doesn't necessarily just invert the polarity of an opinion

• And it's not always negative

– “Sun, sea, sand...having such a terrible time here on holiday.”

Page 45: Challenges of social media analysis in the real world

Getting the scope of hashtags right

Eating breakfast food for lunch. Living the dream.

#toast #rebel #sarcasm

Page 46: Challenges of social media analysis in the real world

Getting the scope of hashtags right

Eating breakfast food for lunch. Living the dream.

#toast #rebel #sarcasm

Page 47: Challenges of social media analysis in the real world

Getting the scope of hashtags right

Eating breakfast food for lunch. Living the dream.

#toast #rebel #sarcasm

Page 48: Challenges of social media analysis in the real world

Getting the scope of hashtags right

Eating breakfast food for lunch. Living the dream.

#toast #rebel #sarcasm

Page 49: Challenges of social media analysis in the real world

Going beyond positive and negative sentiment

Page 50: Challenges of social media analysis in the real world

● baconbkk: This pic is not real. It is a photoshop giggle.

mactavish: It's not. http://www.channel.com/news/london-riots-interactive-timeline-map

Oh my God! This can't be happening at London Eye!

Page 51: Challenges of social media analysis in the real world

Problems of veracity in social media

● Most current rumour analysis has to be done manually● Rumours are challenging: some could take days, weeks or

even months to die out● lll-meaning humans can currently outsmart computers and

appear completely genuine● It's crucial for e.g. journalists, emergency services and people

seeking medical information to know what's really true● To combat this, we can draw on:

● NLP to understand what's actually being said, resolve ambiguity etc.

● web science: using a priori knowledge from Linked Data

● social science: who spread the rumour, why and how● information visualisation: visual analytics

Page 52: Challenges of social media analysis in the real world

4 main kinds of rumour

● uncertain information or speculation● Greece will leave the Eurozone

● disputed information or controversy● aluminium causes Alzheimer’s

● misinformation● misrepresentation and quoting out of context

● disinformation● Obama is a Muslim

Page 53: Challenges of social media analysis in the real world

Using NLP to deal with veracity● Tweets containing swearing and with poor grammar/spelling and little

punctuation are likely to be real in a life-or-death scenario● During an emergency, carefully worded tweets in journalistic style are

less likely to be real tweets by eyewitnesses● On the other hand, tweets containing valid medical information (as

opposed to snake oil) are more likely to be written in good English● Detection of contradiction and entailment helps understand and

resolve conflicting information

TamilNet reported that a second navy vessel had been sunk.

The Sri Lankan military denies that a second navy vessel had been sunk.

Page 54: Challenges of social media analysis in the real world

Evaluation

● How can we evaluate opinion mining performance?

● What kind of results can we expect to get?● What problems typically occur with evaluation?● How can we compare existing tools and methods?

Page 55: Challenges of social media analysis in the real world

Comparing different opinion mining tools

● How do you compare different opinion mining tools, when there are so many out there and they all report different kinds of results?

● It is generally accepted that tools will be 50%-70% “accurate” out-of-the box.

● But what does this really mean?● The following 4 pieces of advice are inspired by a

recent article by Seth Grimes

http://www.socialmediaexplorer.com/social-media-marketing/social-media-sentiment-competing-on-accuracy/

Page 56: Challenges of social media analysis in the real world

1. Don't compare apples with oranges

● Not all tools do the same thing, even if they look the same● Document-level vs topic-level sentiment● One tool might be good at getting the overall sentiment of

a tweet right, but rubbish at finding the sentiment about a particular entity

● e.g. the following tweet is classed as being negative about the Olympics:

skytrain seems to be having problems frequently lately. hope cause is upgraded and they work the kinks out before olympics. ● The tweet is (correctly) negative overall but not

specifically about the Olympics

Page 57: Challenges of social media analysis in the real world

2. Use the same measurement scale

● Positive/negative/neutral vs scalar measurement (-5 to +5)

● Valence vs mood/orientation (e.g. happy, sad, angry, frustrated)

● Is reasonable emotion classification more useful to you than fantastic valence?

● How will you actually make use of the opinions generated to e.g. make decisions?

Page 58: Challenges of social media analysis in the real world

3. How is accuracy defined?

● NLP tools often use Precision, Recall and F-measure to determine accuracy

● But most opinion mining tools are only measured in terms of accuracy (Precision)

● How important is Recall? ● How important is the tradeoff between Precision and

Recall?● What about contextual relevance that incorporates

timeliness, influence, activities, and lots of other still-fuzzy social notions?

● How trustworthy / important are the opinions? Sentiment from a valued customer may be more important than a one-time buyer

Page 59: Challenges of social media analysis in the real world

4. What's the impact of errors?

● Not all inaccuracies have the same impact● If you're looking at aggregate statistics, a

negative rating of a positive opinion has more impact than a neutral rating of a positive opinion

● How do neutral opinions affect aggregation? Are they considered? Should they be?

● In other cases, finding any kind of sentiment (whether with correct polarity or not) might be more important than wrongly detecting no sentiment and missing important information

Page 60: Challenges of social media analysis in the real world

Positive or negative tweets? You decide.

RT @ssssab: Mariano: she used to be a very nice girl, before she discovered macdonalds

There was just a fire at work. Today is looking up.

Yesterday my son forgot his jacket at school. Today he remembered to bring home the jacket, but forgot his lunchbox.

I find myself sobbing at John Le Mesurier's beauty of soul. Documentary about him on BBC iPlayer

Page 61: Challenges of social media analysis in the real world

Other challenges of social media

● Strongly temporal and dynamic: ● temporal information (e.g. post timestamp) can be

combined with opinion mining, to examine the volatility of attitudes towards topics over time (e.g. gay marriage).

● Exploiting social context: who is the user connected to? How frequently do they interact?● Derive automatically semantic models of social networks,

measure user authority, cluster similar users into groups, as well as model trust and strength of connection

● Implicit information about the user: research on recognising gender, location, and age of Twitter users.● Helpful for generating opinion summaries by user

demographics

Page 62: Challenges of social media analysis in the real world

Looking into the future

● Typically, opinion mining looks at social media content to analyse people’s explicit opinions about a product or service

● This backwards-looking approach often aims primarily at dealing with problems, e.g. unflattering comments

● A forwards-looking approach aims at looking ahead to understanding potential new needs from consumers

● This is not just about looking at specific comments, e.g. “the product would be better if it had longer battery life”, but also about detecting non-specific sentiment

● Understanding people's needs and interests in a more general way, e.g. drawing conclusions from their opinions

Page 63: Challenges of social media analysis in the real world

The Ultimate Question

● This book recently ranked #1 on the Wall Street Journal's Business Best-Sellers List and #1 on USA TODAY's Money Best-Sellers List.

● It's all about whether a consumer likes a brand enough to recommend it - this is the key to a company's performance.

● General sentiment detection isn't precise enough to answer this kind of question, because all kinds of “like” are treated equally

● Growing need for sentiment analysis that can get to very fine levels of detail, while keeping up with the enormous (and constantly increasing) volume of social media.

Page 64: Challenges of social media analysis in the real world

So where does this leave us?

● Social media is a tricky but interesting medium to analyse● Social media analysis involves combining aspects from many

fields (not just data mining or NLP)● Opinion mining tools are ubiquitous, but still far from perfect: there

are lots of linguistic and social quirks that fool them● The good news is that this means there are lots of interesting

problems for the academics to research!● And it doesn’t mean we shouldn’t use existing social media

analysis tools in the real world● But we do need to make sure we don't get caught out on the

Bridge of Death!

Page 65: Challenges of social media analysis in the real world

Acknowledgements

Research partially supported by ● the European Union/EU under the Information and Communication

Technologies (ICT) theme of the 7th Framework Programme for R&D (FP7) DecarboNet (610829) http://www.decarbonet.eu and Pheme (611233) http://www.pheme.eu

● Nesta http://nesta.org.uk

Page 66: Challenges of social media analysis in the real world

Questions?