political innovation social media mining

39
1 WHEN IS SOCIAL MEDIA MINING GOOD ENOUGH? OR HELP! I THINK I MIGHT BE A SCIENTIST. Nick Buckley Social Media Director GfK NOP

Upload: phamkhue

Post on 14-Feb-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Political Innovation Social Media Mining

1

WHEN IS SOCIAL MEDIA MINING GOOD ENOUGH?OR HELP! I THINK I MIGHT BE A SCIENTIST.

Nick BuckleySocial Media Director GfK NOP

Page 2: Political Innovation Social Media Mining

2

1. What are we talking about?

Page 3: Political Innovation Social Media Mining

3

Definition* of social media monitoring:“Social Media Monitoring (SMM) means the identification, observation, and analysis of user-generated social media content for the purpose of market research.”

What exactly are we talking about?

What they say

Newsgroups

PublicCommunities

Video sites

Review sitesProfessional & Consumer

Blogs/MicroblogsForums

Client sites

News sites

* http://www.social-media-monitoring.org

Page 4: Political Innovation Social Media Mining

4

What was that 2.0 thing again?

Before the rise of the internet Web 2.0

The “era of shout marketing” is over*:

* Marshall, 2012

Eh?

Page 5: Political Innovation Social Media Mining

5

Web Mining, Social Media Monitoring or Social Media Mining?

I like “Mining”. User generated content in social media lays down a rich seam of activity, opinion, thought and information… mess, echoes and ‘whimsy’.

For some time marketing and PR professionals have been monitoring Social Media to capture headline ‘buzz’ in real time, and to detect sudden changes requiring a response.

But collecting and counting this content is only the beginning of a process which can add value via many techniques… including integration with other sources such as market research data.

Page 6: Political Innovation Social Media Mining

6

Rapid supply-side evolution. What has driven it?

For the original PR and Marketing Users…

• Boring outputs – flat lining “buzz share”• Commoditisation [seeming] of the core process by technology

newcomers• Differentiation by interface… the “Dashboard” – to emphasise

use-cases• Making user self-service easier – for all kinds of reasons• Increasingly sophisticated users… looking for outputs

suggestive of insights• The ‘social CRM’ branch

http://blog.glennz.com/evolution/

Page 7: Political Innovation Social Media Mining

7

2. What happens when Market Researchers get hold of it?

Page 8: Political Innovation Social Media Mining

8

Sony brand damage was driven by PlayStation breach (2011)

sony buzz this year

sony sentiment this year

sony buzz in april

sony sentiment in april

playstation buzz

playstation sentiment

Page 9: Political Innovation Social Media Mining

9

Market Researchers believe that SMM can also give clients a window on other dimensions of online conversations

• Category Dynamics Consumer needs Problems and issues consumer discuss Product usage discussions New product entries

• Corporate Corporate mentions related to reputation Crises Social issues

• Brand Brand/sub-brand mentions, brand “buzz” Number of positive vs. negative sentiments for

each brand Brand content analysis, what’s being said

about brand Advertising noticed most and related

discussion Source of mentions (specific sites.) and the

most influential sites

• Competition All the above for competition

SMM provides insights into:

Page 10: Political Innovation Social Media Mining

© 2012 GfK NOP 10

Inevitably they think about comparison with surveys…

Strengths• Very immediate• Unconditioned by participant awareness of

a research process Often more emotive than considered survey responses

• Spontaneously generated content - unconstrained by research frame.

• Offers insight into active social media users

• Potentially global• You can ‘ask a new question’ without

having to issue a new questionnaire*• Low cost – under certain circumstances

Weaknesses• Not necessarily representative of the general

population• Difficult to weight back to general population,

as demographic data is sparse• Automated sentiment analysis only as good

as the algorithms [and these vary greatly]• Automated harvesting can capture a lot of

‘noise’ for certain words or brands• No guarantee of sufficient data• Costs rise when we use supplementary

analysis to overcome some of these issues

*within certain technical limitations

Page 11: Political Innovation Social Media Mining

© 2012 GfK NOP 11

Different client needs indicate different SMM approaches For example - Precision Extraction vs ‘Trawl & Filter’

Crude mention & mood tracking

Quantitative - Brand tracking and integration with traditional researchIndicative Qual

e.g. using trends and volumes to guide focus of analysisExploratory Qual – more

complex collection. Manually manageable volumes and ‘tuning’

Higher data volumesfrom simple search terms

Lower data volumesfrom targeted & compound search terms

More post processing, applied to data by MR agency - to reduce noise and refine sentiment attribution

Accept raw data output from application

Page 12: Political Innovation Social Media Mining

12

3. Too Abstract?

Page 13: Political Innovation Social Media Mining

13

The raw material - Results from search terms

SMM applications extract results from wholesale supplies of data, conducting searches defined by “search terms”

• These can be anything from a simple and distinctive brand or product name, to a complex expression configured to capture discussions about a category or concept.

• A search term combines words or phrases via logical instructions such as AND, OR, NOT. They may also employ functions such as WITHIN to detect words in a certain proximity to each other. Finally – just as in mathematical equations – brackets can dictate the sequence in which the instructions are applied, e.g.

• “word1” AND ( “word2” OR “word3” )

Page 14: Political Innovation Social Media Mining

14

Typical SMM application offers a dashboard view of data returned by these search terms – and the facility to export the underlying data

Page 15: Political Innovation Social Media Mining

15

Analyses

Whatever the Search Terms define – here is what can be measured about the results returned… in combination or in isolation

Volume – “how much is it talked about, and how is this changing over time”

Channels – “where on the web is it being talked about… twitter, blogs, forums, comments?”Location – “where in the

world is it being talked about?”

Themes – “what other words and phrases are most regularly associated with it?”

People – “who is talking about it?” That may be by influence – according to various proprietary indices – or by demographics [to be used with caution]

Sentiment: Across all of these variables is superimposed automatically generated “Sentiment” analysis – positive, negative or neutral language associated with the subject of the posts…

Verbatims - drill-down to individual posts, in their own words – “what do people actually say?”

Page 16: Political Innovation Social Media Mining

16

Examples of outcomes from SMM studies

FINDING: Focus on the right social media channels at the right time. A manufacturer used a video from a high profile pop star to drive a major campaign. Predictably, when aired, the video generated a ‘spike’ of twitter activity. BUT – looking back down the timeline showed there had also been a burst of activity on forums, and some blogs, from fans of the artist when the video was being shot.

FINDING: Differentiate ‘trade press’ buzz from real engagement. A manufacturer used a novel approach, through Facebook, to support advice and collaboration between users of its product. This appeared to have some success in stimulating social media conversations about the product. However – deeper scrutiny revealed that this traffic was almost exclusively blogging by sector and marketing industry press, attracted by the novel approach, with further blog, forum and link-tweeting activity amongst sector insiders and social media enthusiasts.

Page 17: Political Innovation Social Media Mining

17

Examples of outcomes from SMM studies (2)

FINDING: Consumers don’t always talk about the product features that you highlight. Analysis of conversations about a newly launched electronics product revealed that the functional features most discussed [particularly those with largely positive sentiment attached] were not those which the manufacturer had chosen to highlight. Subsequent marketing was able to adjust to take account of these ‘more loved’ features.

FINDING: ‘The world’ can sometimes throw up more interesting stories about you than you could hope to generate for yourself… but not always with the connotations you would like. An automotive manufacturer which had enjoyed modest online buzz as a result of its own sponsorship activities experienced a ‘spike’ in online mentions which was 10 times the size – as a result of a much repeated witty comment. A high profile celebrity had appeared on TV news being interviewed from the drivers’ seat of one of their vehicles. The comment – linking the celebrity to a negative ‘folk image’ of the vehicle – spread rapidly across a range of social media channels. The moral is that spontaneous, and genuinely social, media can currently still outperform marketers.

Page 18: Political Innovation Social Media Mining

18

BUT!

Page 19: Political Innovation Social Media Mining

19

There are many forces* which erode this nice model…

Accuracy?

Reach?...................................................

Relevance?

Reach image from titletrack.com

Page 20: Political Innovation Social Media Mining

20

AccuracyIs the searched-for phrase even in the returned “snippet”?

Is it ‘content’ – or is it• Navigation?• Ticker or title content?• Ad Content?• Various species of spam [overlaps with ‘Relevance’]?

Is meta-data about the poster• Present?• Reliable?

Understanding this, apart from making your own manual checks, is about understanding your third party suppliers’ processes and content and, often, that of their ‘wholesale data suppliers’ – each of which may differ from the others.

Page 21: Political Innovation Social Media Mining

21

Reach

[T]here are known knowns; there are things we know that we know.There are known unknowns; that is to say there are things that, we now know we don't know.But there are also unknown unknowns – there are things we do not know, we don't know.

Donald Rumsfeld

• Are these results from scrutiny of the entire [English speaking] social web No• Are they results from a very large, sometimes stated, number of social sources? Yes• Could this range be skewed relative to the subject under scrutiny? Yes• Where it’s Twitter data – is it from the whole of Twitter Maybe• Is historical data always the same basis as current data, or data gathered since the search was defined? Not always• Do we always have a good idea of what the ‘Reach’ is? No

Page 22: Political Innovation Social Media Mining

22

Relevance

Even when the application has collected exactly what we asked for, and it is legitimate content, with some nice useful data about the poster… it might not be relevant

“Cats are great company.”

“#EMT Bolt one cool cat!”

“Also, the Cat is a great resort”

“I love my aunt Cat!”

“I think Cat Stark is worse than any Lanister.”

“I think this hurricane was a scam cooked up by the fat cats in Big Grocer.”

Page 23: Political Innovation Social Media Mining

23

Challenges include

However , commencing too early public smoking facts will just overstress your pet ; quite a fresh pet will not learn everything from services. Just after he has ended up perched for some a few moments, supply him with the particular take care of, plus for instance in advance of, make sure you compliment the pup. When dog house teaching your dog, continue to keep the dog house in the vicinity of the spot where you as well as the canine are usually conversing.

Page 24: Political Innovation Social Media Mining

24

And I haven’t mentioned automated Sentiment Analysis yet!

Irony – really?

Slang/Dialect/Register

Multiple meanings – “50 strong”

Adjacent subjects – “My beautiful FIAT next to a BMW”

Page 25: Political Innovation Social Media Mining

25

4. And what is Good, and what is not Good?

Page 26: Political Innovation Social Media Mining

26

To Recap• SMM tools make it very easy to “Super Google” certain Brands, people, objects and even

categories or concepts – quickly generating convincing-looking tables and charts.

• But underneath there’s a complex story about accuracy, reach and relevance… which becomes apparent on scrutiny of drilled-down text samples – and can only fully be understood by getting inside the provider’s systems and sources.

• It doesn’t mean they are misleading users – it just means that they started out somewhere else.

• The conclusion is that you have to carefully consider use cases, or build your own better mouse trap, or wait for proprietary solutions to get better at certain things

• Sentiment analysis is part of this story – but doesn’t define it.

Page 27: Political Innovation Social Media Mining

27

Natural Language Processing [NLP] to the rescue?

Definition

“Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output”*

Most SMM applications claim some level of NLP.

*Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview

Whilst this may be legitimately contrasted with simple vocabulary, combination and probabilistic methods, it can end up meaning little. It may only mean that some rules of language have been ‘attended to’ in what is still essentially a pattern-matching exercise

Page 28: Political Innovation Social Media Mining

28

But clearly sophisticated NLP would make a big difference

• Improved Accuracy – including filtering out of unstructured spam

• More tools available to achieve/check Relevance

• Much-improved Sentiment Analysis

Some commercial tools have become available in the last 12 months which offer an assessment of their confidence in their own NLP analysis – dividing snippets into those with Low, Medium and High confidence.

Significantly, ‘High’ is a minority of the output.

Page 29: Political Innovation Social Media Mining

29

Barking up the wrong Tree?

The recap assumes that the Market Researcher’s instinct is correct… to make the fuzzy working of the social web itself… the collection mechanisms and enterprises, and the analytical engines… into a familiar data collection process, somehow isomorphic with surveys.

But “what is good” is, as many of the ancient philosophers would tell us, about function and purpose.

I think we’ve now learned enough,

• and experienced enough un-straightforwardness

• and contemplated enough need for manual evaluation or augmentation - dispelling the notion that this is a self-evident labour saving device along the way…

to stop and ask, “what was it we were trying to do?”

Page 30: Political Innovation Social Media Mining

30

To Recap• SMM tools make it very easy to “Super Google” certain Brands, people, objects and even

categories or concepts – quickly generating convincing-looking tables and charts.

• But underneath there’s a complex story about accuracy, reach and relevance… which becomes apparent on scrutiny of drilled-down text samples – and can only fully be understood by getting inside the provider’s systems and sources.

• It doesn’t mean they are misleading users – it just means that they started out somewhere else.

• The conclusion is that you have to carefully consider use cases, or build your own better mouse trap, or wait for proprietary solutions to get better at certain things

• Sentiment analysis is part of this story – but doesn’t define it.

Page 31: Political Innovation Social Media Mining

31

What are we trying to do?• Use the social web as a proxy for the population?

• Understand how the social web is responding – for the benefit of those solely interested in this sub-set of the population as a channel or marketplace?

• Access particularly niches which are more concentrated online than off?

• Detect significant events?

• Measure shifts and changes?

• Make rough comparisons?

• Discover new insights, themes and connections?

Page 32: Political Innovation Social Media Mining

32

How useful is extracted Social Media content?Mechanically extracted content is inevitably imperfect as regards:• relevance • comprehensiveness relative to ‘total web’• accuracy of classification, sentiment etc• representativeness of general population

In general web mining is therefore useful for:• relative measures• measuring and detecting

change or discontinuity• iterative discovery of

related concepts and drivers

• comparing channels• matching to events and

schedules

It’s important to know when this matters, and how much. It is vital to work honestly with the constraints and exploit the strengths…

…and, of course, integration with other sources of data.

Page 33: Political Innovation Social Media Mining

© 2012 GfK NOP 33

Different client needs indicate different SMM approaches For example - Precision Extraction vs ‘Trawl & Filter’

Crude mention & mood tracking

Quantitative - Brand tracking and integration with traditional researchIndicative Qual

e.g. using trends and volumes to guide focus of analysisExploratory Qual – more

complex collection. Manually manageable volumes and ‘tuning’

Higher data volumesfrom simple search terms

Lower data volumesfrom targeted & compound search terms

More post processing, applied to data by MR agency - to reduce noise and refine sentiment attribution

Accept raw data output from application

Not radical enough!

Too much like hard work

Sensible

Page 34: Political Innovation Social Media Mining

34

Rather than wait for NLP utopia…

Settle for:

1. SMM as a powerful and novel Qual exploration tool

2. Do big number crunching on brands but take a “hyena” approach.Accept all* occurrences of a brand or product name in posts as an indication of significance… even the spam and the adverts and the competitions

Similarly look for pure correlations between words/phrases and other word/phrases

Or between trends in these numbers and classes of offline events – such as sales, complaints and other behaviours… with a view to predicting, explaining or causing such events in the future.

*Except for the most obvious duplication errors such as over-indexing

Page 35: Political Innovation Social Media Mining

35

5. Some Conclusions

Page 36: Political Innovation Social Media Mining

36

I am not a scientist

OK – I’m a scientist amongst researchers, and possibly amongst programmers

But amongst scientists – and text analysis specialists – I’m a mere researcher.

Because I couldn’t use these tools “as is” with confidence I had to start delving…

… and delving is time consuming in a commercial environment.

Our technology suppliers have become more like partners… increasingly transparent as they’ve understood, but not challenged, what we tried to do. The software and services will now adapt to us – whether they should or not.

PR monitors, real time trackers and ‘social CRM’ folks will carry on using the tools the same way they always have… and may even benefit from changes my industry has now initiated.

Page 37: Political Innovation Social Media Mining

37

But

How will commercial SMM applications and services with the best accuracy, reach and relevance capabilities be recognised, validated and promoted?

Is the ‘bit in the middle’ just a holy grail until such time as the NLP part of the reckoning makes a step change – driven by all its other exploitations, such as ordinary language driven IT interfaces.

If you’re a researcher and you want to use this stuff tomorrow… what must be done?

Fortunately – there’s enough to learn by “super-googleing”, browsing and crude trend tracking to keep us going… and learning… for some time to come.

Page 38: Political Innovation Social Media Mining

38

Page 39: Political Innovation Social Media Mining

39

Dr Nick Buckley

Social Media Director

GfK NOP

M: 07958 516967 T: @grimbold

E: [email protected]

[from August 2012. E: [email protected]]