let’s start: why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · misinterpreting...

30
Let’s start: Why do statistics? p You are probably doing this class because you have to. p To make matters worse, you probably don’t want to. p Statistics is a very important subject, here are a few reasons: p It is important to properly evaluate data and claims that bombard you every day (news programs, health magazines, internet claims, word and mouth, your classes etc.). p If you cannot distinguish between good and bad reasoning you are vulnerable to manipulation and making decisions which are not in your best interest. Statistics is a means of analyzing information and making informed decisions. p Statistics is also used in many areas of research as diverse as: p Animal sciences. p Medicine. p Agriculture q Sociology and the political sciences.

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Let’sstart:Whydostatistics?p You are probably doing this class because you have to.p To make matters worse, you probably don’t want to.p Statistics is a very important subject, here are a few reasons:

p It is important to properly evaluate data and claims that bombard you every day (news programs, health magazines, internet claims, word and mouth, your classes etc.).

p If you cannot distinguish between good and bad reasoning you are vulnerable to manipulation and making decisions which are not in your best interest. Statistics is a means of analyzing information and making informed decisions.

p Statistics is also used in many areas of research as diverse as:p Animal sciences.p Medicine.p Agricultureq Sociology and the political sciences.

Page 2: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Misinterpretingstatisticsp You will see statistics everywhere, examples include:

p Interracial marriages have increased by 75% over the past 20 years.p 20% of road deaths involve a commercial vehicle.p In a survey of 200 students, 85% said they owned a smart phone.

p Based on the above statistics, the following claims were made in newspapers. From the data alone, are these claims valid?p We see that interracial marriage is now common in society.

§ Yes § No

p Commercial vehicles make roads more dangerous.§ Yes§ No

p Over 80% of students own a smart phone?§ Yes§ No

Page 3: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p Suppose 20 years ago interracial marriage was 1% of all marriages and now it is 1.75% of marriages. The proportion has increased by 75%, but one cannot make the claim that interracial statistics common in society based 1.75% of all marriages.

p The 20% does not give a relative proportion for the proportion of commercial vehicles on the road. For example, suppose the proportion of commercial vehicles on the road were 30%, then this claim makes no sense. If commercial vehicles were more dangerous we would expect the proportion of road deaths involving a commercial vehicle to be greater than the proportion of commercial vehicles. Indeed, if 30% were the proportion of commercial vehicles, the 20% claim suggests that commercial vehicles are safer!

p The last claim is based on a sample of the student population. Just because 85% out of 200 students own a smart phone it does not mean that at least 80% of all students own a smart phone. To make statements about the population:p We first need to be sure it is representative of the student population –certain

groups should not be over or under represented. § We need to use statistical tools to transfer what we see in a sample (a small

subset of a population) to the entire population.

Page 4: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

All these examples illustrate that to intelligently use statistics we must question the statistics we encounter. We should think about where the numbers came from, their sources and the procedures used to generate them.

Page 5: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Mathematicalbackgroundp This section is to self-read.p This is NOT a mathematics class. p But some basic algebra will help. A summary of what is required is

given below. p Intervals. If a value is said to lie in the interval [8,10] (smallest number first),

then this value can be any number between 8 and 10. The same idea generalizes to algebraic intervals. If a value lies between [a,b], then the value can take any number between a and b. Sometimes we will use algebra instead of numbers – don’t feel get anxious about this. Plug in numbers into the equation to get a better understanding.

p Length. The length of the interval [8,10] is 2. The length of the interval [a,b] is (b-a).

p Solving equations 1: You should be able to solve the linear equation ax + b = c in x. The solution is x=(c-b)/a.

p Solving equations 2: You should be able to solve the equation a/√n = b in n. The solution is n = (a/b)2.

q You should be able to evaluate an average. It is the sum of all the numbers in each group divided by the total number of groups.

Page 6: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p Word problem 1 Suppose in a time and effort study it was found that the number of people involved in a job influences the time it takes to complete the job through the following formula

p Question: How many more people do you need to reduce the length of time by half?

p Answer: The original time to complete a job is not given so the question, so we use algebra instead.

q To understand how the formula works, we replace t and n with numbers:q Suppose that n = 4, then t = 1. If we increase the number of people to 8

then the time reduces to 0.5 (time halves). q Suppose n = 8, then t = 0.5. We see if we increase the number of people

to 16, then the time reduces to 0.25 (time halves).

Time to complete job =

4

Number of people on job

t|{z}original time

=4

n|{z}original number of people

Page 7: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

q You will notice that we have to double the number of people to order to decrease the time by a half.

t

2=

4

2n

Page 8: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

q Word problem 2 Suppose in a time and effort study it was found that the number of people involved in a job influences the time it takes to complete the job through the following formula

q Question: How many more people do you need to reduce the length of time by half?

q Answer: Again we replace algebra with numbers. q n = 4 then t = 2. If we increase n = 16, then t = 1 (time halves). q n = 16 then t = 1. If we increase n = 64, then t = 0.5 (time halves).q By what factor do we need to increase the number of people to decrease

the time by half?

t|{z}original time

=4

rn|{z}

number of people

t =4pn

Page 9: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p We see from these examples, if we quadruple the sample size (increase the sample size from n to 4n), then the time decreases by a half.

q Comparing problem 1 and 2, we see that the square root in the equation means we have to increase the size of people by a larger factor to get the same time reduction.

q This is not statistics, but basic algebra. On occasion we will use it, but not often.

Page 10: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Definition:probabilities

p The interpretation of probabilities is fundamental statistics. p We need to understand how they are calculated.

p DEFINITION: The probability of an “event”, is the number of times it occurs in a population divided by the number in the population.

p Often it is impossible to calculate the probability from the population and we need to estimate it from data.

Page 11: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Example:Calculationp Stents is a device placed in blood vessels to aid recovery after

a cardiac arrest and reduce the risk of another cardiac arrest. It was thought that it may reduce the risk of stroke in at risk patients.

p A study was done to investigate this claim

Page 12: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Definition:Percentilesp Definition: The value below which a given percentage of

observations (in a group of observations) falls.p Example: Sam is 67 inches and is in the 60th percentile

amongst 20 year olds. Or Sam’s height is in the 60th

percentile.p The top percentile is the proportion of the population

greater than a specified value. p Example: Sam’s height is in the top 40th percentile.

Page 13: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Examples:Interpretingprobabilitiestomake“inference”

p Statistics is about making confident statements (intelligent guesses) based on only partial information (usually a sample).

p This is usually done by interpreting probabilities and chances.

Page 14: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Howtointerpretrandom?We come across the word random all the time.

Example: Sam, went on holiday, where Sam sees their best friend Jon at the same place. Sam thinks “OMG, I saw Jon on holiday, how random is that!”. p Random refers to the chance of Sam and Jon having a chance

encounter. p The OMG means that Sam thinks this chance is small. There are

two explanations for this encounter:§ It was by chance. § It was not by chance, that is Jon did not choose their destination

independent of Sam.

Page 15: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p The probability of a chance encounter helps us to make an intelligent guess at the explanation.

p What if the probability is 10%?

p What if the probability is 0.01%?

Page 16: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Example1:Stentsintheheart

p We see that in this sample, the proportion of people who had a stroke was higher than the those given stents. Clearly there is nothing in the data to suggest that stents reduce the risk of strokes.

p What do the numbers tell us:p Stents give worse outcomes?p The differences seen can be explained by natural variation

in the data.

Page 17: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Example2p On a barmy summer afternoon in Cambridge during the 1920s a group

of Cambridge professors and their spouses were enjoying afternoon tea. Traditionally, in the UK, tea is hot and drunk with milk.

p One lady insisted that the tea tasted different depending on whether milk was poured first into the cup and then the tea or if the tea was first poured and then the milk. The men-folk scoffed, How could there be a difference. The chemistry of the mixture must be the same, regardless of the order.

p One gentlemen, called Ronald Fisher, suggested that the claim could be tested with an experiment.

p He suggested the following experiment:p 8 cups of tea should be prepared, 4 where the milk was poured first and 4

where the tea was poured first.p The cups should be randomized (this means the cups shuffled around). p The lady should be given each cup to taste. She should be told that 4 are

with the milk poured first and 4 with tea poured first.p The lady should identify which cups had milk poured first.

Page 18: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p The lady could have got all correct just by guessing (without really knowing which cup was which).

p There is a probability associated to this eventuality. Fisher calculated the probability of getting all correct, 7 out of 8 correct, 6 out of 8 correct etc. all by mere chance (coincidence).

p He calculated that the probability of identifying all correctly bychance was 1 in 70 (1/70 = 1.4%).

p Returning to the experiment: The lady identified all the teas correctly.

p Fisher deemed that the 1.4% probability of it happening by chance as sufficiently small to reject the notion that it was due to chance. In other words, the data suggests that the lady did know which came first (milk or tea).

p However, based on data alone we cannot prove that she knew. We can only make intelligent guesses (this is called statistical inference).

Page 19: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

QuestionTimeSam scores a C in a multiple choice exam. He brags to his friends that he just `guessed' every answer. In a multiple choice exam the chance of getting a C by simply guessing every answer is 0.05%. What can we say about Sam's claim?

p A Sam was definitely guessing.

p B Since 0.05% is relatively small, it is impossible that Sam guessed his way through the exam. We conclude that Sam definitely knew some of the material, he was lying when he said he guessed.

p C 0.05% is a relatively small probability, it seems unlikely that Sam guessed all the answers. But 5 out of 10,000 students can get a C by only guessing.

Page 20: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Example3:Sparklingwaterp See the article by Eweis, Abed and Stiban (2017)

https://www.stat.tamu.edu/~suhasini/CO2beverage.pdfp Their objective was to study the impact that carbonated in water had

on the weight gain and hunger of rats (and humans).p 16 rats (born on the same day), were randomly assigned to one of four

groups. Each group was given one of four drinks p Regular water (Water). p Regular carbonated beverage made with sugar (CB) p Diet carbonated beverage (DGB)p Degassed carbonated beverage made with sugar (DgCB).

p They were not given any other drink besides the on the prescribed group.

p The rats had unlimited access to rodent food.

Page 21: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p The average weight of the rates in each group is given below (after 110 days).

p The height of each bar corresponds to the average weight of the rat in each group.

p One half of the back bar corresponds to the standard deviation for that group; we define this later.

Page 22: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p We see differences. p The weight of rats consuming regular soda is higher than

those on water. p Those consuming diet soda is also higher than the water

group. p Those consuming the flat regular soda was less than

carbonated diet group.p These averages are based on 4 rats. p The bigger the difference the less believable it is due to

chance variation. But the spread of the data also plays a role. p It could be that the difference is due to natural variation.p It could be the difference is due to the different drinks

consumed.

Page 23: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Example4:Shoppingp Which product do we choose?

Page 24: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Example5p Summary of an article from July, 2015.

Looks can Kill “The American justice system is built on the idea that it is blind to all but the objective facts, as exemplified by the great lengths we go to make sure the jurors enter the courts unbiased and protected from outside influence during their service. This idea does not always match reality” say psychologists John Paul Wilson and Nicholas Rule, co-authors in the study. They conducted the following study:q The researchers compared inmates on death row and those who

received a life sentence in Florida. They chose Florida because it has many people on death row and it also keeps a database of photos of all convicts.

q The researchers obtained the photos of 371 convicts on death rowconvicted of first degree murder (226 white and 145 black).

Page 25: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p To make their comparison, they obtained the photos of 371 convicts convicted of first degree murder who were given a life sentence, to “control for race” they used 226 white convicts and 145 black convicts (same ratio of race in both groups).

q The photos of all prisoners were turned into black and white photos and placed on sheets of papers.

q 208 Adult Americans (who did not know that any of the men were convicts) were asked to rate the trust-worthiness of each convict using just their photo. The scale was from 1 to 8, 8 being very trust-worthy and 1 being not at all trust worthy.

q The researchers found that the average trust-worthy score givento convicts on death row was 2.76 compared with those given a lifesentence which was 2.87.In other words, for this sample those on death row were given a lowertrustworthy rating.

Page 26: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

QuestionTimeQ4) Based on the difference 2.87 and 2.76, can we immediately draw the conclusion that in general people on death row look less trustworthy than people who are given a life sentence?q Yesq No

Q5) Is the difference between 2.87 and 2.76 so small it hardly matters?

q Yesq No

Page 27: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p Based on the difference 2.87 and 2.76, can we immediately draw the conclusion that people on death row in general look less trustworthy than people who are given a life sentence?p The answer is no. p If you draw two subsets (we call this a sample) of numbers (drawn from

the same population) and evaluate the average in each subset, these two averages will usually be different. This could be one possible explanation for the difference seen in these two averages.

p Could you say the difference between 2.87 and 2.76 is so small it hardly matters?p The answer is no.p The difference is small but that difference may indicate a hidden

prejudice.p We use statistics and the analysis of probabilities to disentangle

differences due to chance and differences due to an underlying prejudice.

Page 28: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

Conclusionofstudyp This difference is not huge, but the authors showed that it was

statistically significant as the p-value is less than 1% (technical jargon, which we learn later in the course).

p What this means, is that a difference of 2.87-2.76=0.11 or larger occurring between the two groups by chance (due to sampling differences) is less than 1%.

p In other words, suppose we focused on only people given a life sentence and split them into two groups, each with 371 people. Suppose the average honesty rating was evaluated for both groups and the difference was taken. There will be a difference because different samples yield different averages.

p If this was done a 100 times, we would only see a difference of 2.87-2.76=0.11 (or more) less than one out of a hundred times.

p This proportion (called a p-value) is so small it suggests the difference between the groups cannot be explained by differences in sampling. And that, in fact, untrustworthy looking people are more likely to face the death penalty than trustworthy people.

Page 29: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

p Based on the analysis so far, we could conclude that people who look more dishonest commit more terrible crimes than more honest looking people.

p The researchers took this into account by compared the honesty ratings of death sentencers who were acquitted (usually on DNA evidence) with life sentencers who were acquitted.

p The men in both groups were innocent. Again a statistically significant difference in the honesty ratings was seen. Suggesting that looking dishonest does not mean you are more dishonest.

p Conclusion: The differences are small, but the data suggests that looks have an impact on the severity of a sentence.

Page 30: Let’s start: Why do statistics?suhasini/teaching301/stat301... · 2019-08-26 · Misinterpreting statistics p You will see statistics everywhere, examples include: p Interracial

AccompanyingproblemsassociatedwiththisChapterp Quiz 1 (Q3 and Q4)