scientist! lies, damned lies and statistics€¦ · lies, damned lies and statistics think like a...

11

Click here to load reader

Upload: ledien

Post on 04-Jun-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

The secret language of statistics, in a fact-minded

culture, is employed to sensationalise, inflate,

confuse and oversimplify.

!

"(Darrell Huff, 1954; How to Lie with Statistics).

Whether we like it or not, statistics are all around us. From political polls, information about share prices or the cost of electricity, we interact daily with statistics in numerous ways and forms. For many of us, this kind of information seems trustworthy and unbiased, (after all, numbers don’t lie, do they?), and it’s tempting to take it at face value. But, just like many other types of information, statistics can be skewed to sell a particular story, enforce a particular argument, or can be unintentionally misrepresented.

This guide aims to offer a user-friendly introduction into five of the most common statistical pitfalls and misconceptions. We’ll highlight some of the ways data and statistics can be used to hoodwink us, and give some practical advice on avoiding these traps within your organisation.

Lies, damned lies and statistics

Think like a

Scientist!

Page 2: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

Averages (technically known as ‘means’) can be very persuasive. They are one of a number of different ways we can measure central tendency – a single number that describes an entire set of data by a single central or typical value. Averages are extremely popular and are often first choice for summary statistics. They can give a reasonably clear and easily digestible picture of what’s going on within a dataset, and most people have some familiarity with their meaning. However, sometimes averages are not all as they seem. Take a look at the hypothetical team sales table below:

If we only take the data from this table, it looks like both teams performed identically in January. And if the sales target is £45,000 per month, both are performing very well. But, if we look below the surface of these averages and add in individual team member performance, the tables below tell a very different story.

We can now see that everyone in Team A is performing at relatively similar levels, but there’s considerable variation in Team B. So while there’s little action to take with Team A, we might want to find out why Steve is underperforming compared to his teammates, and reward Steve for his outstanding performance. By only looking at averages, we lost meaningful information about organisational performance.

Sales Team

Average Sales – January 2016

A £50,000

B £50,000

Team A

Individual Sales – January 2016

Kirsty £51,000

Alan £54,000

Sophie £45,000

Average Sales £50,000

Team B

Individual Sales – January 2016

Jo £49,000

Paul £76,000

Steve £25,000

Average Sales £50,000

1. Beware of misleading averages |

Page 3: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

Measures of central tendency can therefore over-simplify and under-inform us, leading to poor decisions. When making decisions, we need to understand detail such as the distribution (or shape) of the data, and know if there are any outliers (extreme scores) distorting our picture of the data. For example, if your organisation was full of people like Paul and Steve, implementing a company-wide sales training programme would be a waste of time and resources, and demotivating for high performers. Likewise, if a new team member sells nothing in their first month (as you might expect), therefore lowering the team average, the problem needs to be addressed at the individual, rather than the team level.

While averages can be a useful high-level indicator of a dataset, it’s important to make sure you’ve got all the critical information before taking action. If someone’s presenting you with averages and you have an important decision to make, think about whether they’re telling you the full story, and ask for more detailed data if needed. Look at your own organisational data through a closer lens – check for extreme scores and any distinct groups that may need to be treated differently.

Page 4: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

2. ‘98% of statistics are misleading’ – the problem with percentages

We’ve all turned on the television and heard adverts claiming: “98% of UK women agreed that product X dramatically reduced signs of aging”. This sounds great in principle, but statements containing impressive percentages are often flawed.

Firstly, the product has most likely been given for free (so what’s not to like?), but more importantly, do we know how many people actually tested it? If you look at the tiny print at the bottom of the screen, more often than not, only tens of people will have been involved in the trial- and 98% sounds a lot less impressive if only 20 people were asked. In other situations, we might not be given any information at all about the numbers percentages are based on.

Percentages like this can be misleading for a number of reasons:

$ They look like absolute figures, so it’s difficult to question them

$ They can sound like they’re based on much more data than they often are

$ They rarely have enough data to decide if something statistically significant

$ They can lead us to compare two things unfairly

All of these can skew our understanding of the information we’re being presented with, and ultimately stop us from making good decisions. The number of data-points needed to draw meaningful percentages will vary depending on what you are trying to find out, but as a rule, more is usually better. If someone is unwilling to give you this information, beware- they may have talked up their data to make it more convincing.

%

Page 5: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

3. Comparing chalk with cheese

As psychologists, our aim is to make observations about groups of people, known as ‘populations’. This could be employees at a particular organisation, in a particular sector, or the working population as a whole. It’s usually impossible to gather data from every member of a population, so instead, we aim for a representative sample – a random selection of people who reflect the population as a whole. Random selection ensures we can be confident that the findings from our research will be true of the whole population, and not caused by a unique characteristic of the group chosen.

Imagine you want to implement a new stress-reduction programme in your organisation of 10,000 workers. It would be time consuming and costly to test it on all 10,000 at once without a guarantee of it working, so instead you decide to pilot the programme with a smaller group first. How do you decide who participates?

$ Option 1: You could send out an email asking employees to volunteer on a first-come-first served basis.

$ Option 2: You could select a particular department to participate.

While these approaches would be quick and simple, they could cause problems later down the line. For example, it could be that the first 100 people who reply to your email are feeling the most stressed and are keen to be heard, or they might be the employees who are most open to taking part in new projects. Similarly, employees in the finance department might be very different to those in marketing, in terms of job roles, their stress experience, and even personality traits. By using these types of selection methods, you create the possibility of differences between your participants and the rest of the organisation. This means you cannot be confident that what you find will be the same for other people (or generalise your findings).

Page 6: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

In many cases, a completely random selection is just not practical, however there are easy ways to make your selection more robust. For example, you could take every 3rd/5th/8th employee on your payroll, or use a quota system to make sure the right types of employees are represented (male to female ratio, different managerial levels, or different departments).

A lack of generalisability is a key issue in organisational interventions. While it’s often a necessary evil, it also means we should be cautious when buying anything off-the-shelf, or something that hasn’t been applied in a similar way before. We recommend asking yourself: Who are these findings based on? And how sure am I that they’ll be the same for my organisation?

Page 7: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

4. Rose-tinted graphs

When used well, graphs are a clear and accessible way of presenting data. Good graphs can bring the story in a dataset to life and enable us to visualise what’s going on. But as you’ll see in the examples below, graphs can also be used to distort information and mislead us.

Imagine a training provider shows you the following graph as evidence of their sales training success

As we can see, the graph shows that the training seems to have led to a moderate improvement in performance, but it’s nothing drastic. You might be tempted to ask yourself whether the sales training was an effective investment?

But what if the provider produced this graph instead?

Team Sales

Sales before training

£60,000

£50,000

£40,000

£30,000

£20,000

£10,000

£0Sales after training

Team Sales

Sales before training

£50,000£49,000£48,000£47,000£46,000£45,000£44,000£43,000£42,000£41,000£40,000

Sales after training

Page 8: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

Suddenly, the effects of the training look much more impressive! Exactly the same information is being shown, but it’s been distorted to tell a certain story. By not starting the sales numbers from zero and decreasing the increments to £1000 each, the gap between the before and after figures looks bigger, making the training appear better. Without closer scrutiny, it’d be easy to accept the apparently impressive results at face value.

Similarly, this graph below was used as proof of global warming:

It shows a trend of rapidly rising temperature levels that would be alarming, were it not for the fact that only data from January to July is shown. The creator of the graph has removed information that isn’t in-line with the message they want to deliver, making the graph highly misleading. If it showed the full year, we’d see a drop-off in temperature towards the winter months, and there would be very little of interest. Better still, showing a comparison of temperature year on year would have given a much better picture of global warming, without being misleading.

When being presented with graphs, it’s important to focus in on the detail, as well as the broader picture. If someone shows you a pie chart, check whether the segments actually add up to 100, if it’s a bar or line chart, check where the axes start from and how big or small the increments are. Think about whether something important might have been left out or manipulated to support an argument or tell a particular story.

Average Monthly Temparature in Farenheit

80.0º

70.0º

60.0º

50.0º

40.0º

30.0º

20.0º

10.0º

0.0ºJanuary February March April May June July

Page 9: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

5. There are no prizes for eating lots of chocolate!

Did you know that statistically, the countries that eat the most chocolate also produce the most Nobel Laureates? Or that when iPhone sales are higher, more Americans die from falling down the stairs? Unfortunately, it’s unlikely that chocolate is the driving force behind Laureate achievements, and likewise, iPhones are unlikely to be banned on staircases anytime soon. These are (strange) examples of correlations, a statistically significant relationship between two phenomena at a set point in time. And hopefully, the examples we’ve used demonstrate that correlations aren’t always reliable.

Nob

el L

aure

ates

per

10

milli

on p

opul

atio

n

Chocolate consumption (kg/yr/capita)

0

5

10

15

20

25

30

r = 0.791P < 0.0001

35

0 5 10 15

Portugal

China BrazilJapan

GreeceItaly

Spain

AustraliaPoland

CanadaBelgium

France Finland

Ireland Germany

Norway

Denmark

SwedenSwitzerland

Austria

United Kingdom

United StatesThe Netherlands

Correlation between countries’ annual per capita chocolate consumption and the number of Nobel Laureates per 10 million population.

Page 10: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

Correlational relationships can be misleading in two critical ways:

$ They can occur by chance if there’s enough data (especially if we’re not looking for anything in particular)

$ People wrongly make claims about causality

If you analyse lots of data on lots of different variables, the chances are that you might find a correlational relationship somewhere. Like our previous examples, it doesn’t necessarily mean that the two are actually related. And if we’re not looking for anything specific, it’s a bit like a statistical fishing trip. The chance of finding something is high because we have little control over what might come out. This is particularly concerning given the recent rise in popularity of ‘big data’ and HR analytics, because practitioners may be tempted to do exactly this. Acting on the basis of chance correlations could have considerable consequences for organisations, and any action taken might do more harm than good.

Like the fact that eating chocolate doesn’t cause Nobel Laureates to produce their best work, a correlational relationship doesn’t necessarily imply causation. Because both things were measured at the same time and there’s no before and after event, we logically cannot be sure that one caused the other, or which way round this might be. After all, it’s possible that winning a Nobel Prize encourages people to eat more chocolate! Analysing data for correlations can give an indication that two concepts might be linked, but this isn’t enough to create messages around causality.

In practice, correlations are often misused, deliberately or mistakenly, to suggest causality where there isn’t any evidence for it. If someone wants to sell you a new psychometric assessment that claims to select better people, or an engagement solution that will increase productivity, what sort of evidence do they have that it works? Have they done a before and after (longitudinal) study? If not, we cannot be sure if their intervention caused the change, whether it was a coincidence, or if something else entirely caused it to happen.

There are a lot of small data problems that occur in big data. They don’t

disappear because you’ve got lots of stuff, they get worse.

!"

David Spiegelhalter (Winton Professor of Public Understanding of Risk at Cambridge University).

Page 11: Scientist! Lies, damned lies and statistics€¦ · Lies, damned lies and statistics Think like a ... averages, we lost meaningful information about organisational performance. Sales

General advice for dealing with statisticsWhile we don’t recommend treating every statistic and piece of data you encounter with suspicion, considering what might be going on underneath the surface may help prevent you making poor decisions and investments. When you’re given statistical information, ask yourself the following questions:

1. Do I understand what the data is telling me? Don’t be afraid to ask for clarification if something is unclear, especially if it feels like you’re being blinded by statistics (it might be on purpose!).

2. Who provided the statistical information? Who is making the claim and what’s in it for them? Do they have a vested interest, or are they trying to sell me something? Are they citing their own (likely biased) research, or is it from a reputable source?

3. How do they know? Have they got good evidence for what they’re saying? What sort of evidence is it and what is it based on? Is it likely to be applicable to your situation (watch out for student samples!)?

4. What’s missing? Is there a piece of data that’s conspicuously absent? Is there something more you need to know before you accept what they’re telling you? Is one of the pitfalls being used? If you have experience in the area, does the information match with your own knowledge and expertise?

Not all statistics are out to trick us, but adopting a cautious mindset and an attitude of healthy scepticism may prevent you falling prey to pitfalls, and help you make well-informed decisions.

We can helpAt the Future Work Centre we’re passionate about the communication of science and the value of evidence-based practice. We help organisations make sense of their data through the provision of flexible training modules, practical user-friendly resources and professional services.

020 7947 4273

[email protected]

@FW_Centre

C

%

@