statistics quick overview

54
Copyright by Michael S. Watson, 2012 Statistics Quick Overview Class #3

Upload: lacy-wiley

Post on 31-Dec-2015

19 views

Category:

Documents


0 download

DESCRIPTION

Statistics Quick Overview. Class #3. A/B Testing in Obama’s 2008 Campaign. Objective: Maximize Sign-Up Rate. Source: http://www.youtube.com/watch?v=7xV7dlwMChc. So, What is Your Guess?. A/B Testing for On-Line Businesses. What is it? Develop two versions of a page - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics Quick Overview

Copyright by Michael S. Watson, 2012

Statistics Quick Overview

Class #3

Page 2: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book

A/B Testing in Obama’s 2008 Campaign

Objective: Maximize Sign-Up Rate

2

Source: http://www.youtube.com/watch?v=7xV7dlwMChc

Page 3: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 3

Page 4: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 4

Page 5: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 5

Page 6: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 6

Page 7: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 7

Page 8: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 8

Page 9: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 9

Page 10: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book

So, What is Your Guess?

10

Page 11: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 11

Page 12: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 12

Page 13: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 13

Page 14: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 14

A/B Testing for On-Line Businesses

What is it? Develop two versions of a page Randomly show users different versions Track how they do Uses statistics to decide which is better Answers yes/no questions

Why? You have the data to do it Web sites convert a small number of users Some see a 40% increase in conversion

Source: Ben Tilly [email protected]

Page 15: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book

Some Lessons from A/B Testing

Explore before you refine Example: ABC Family:

− Existing Website: Promotions for upcoming shows− Radical Idea: People come to the website looking for old episodes

15

+600% engagement

Page 16: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book

Some Lessons from A/B Testing

Words Matter, Call to action

Which button led to the biggest increase in donations?

16

Page 17: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book

Some Lessons from A/B Testing

Words Matter, Call to action

Which button led to the biggest increase in donations?

Trick question. Depended on what campaign knew!

17

Page 18: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 18

Thought Exercise with Our Packaging Example

Original Case (mean = 290, sd = 53)

Less Variability (m = 290, sd = 5) More Variability (m = 290, sd = 186)

If a store manager came to you and said, “what will my sales be?” how would you answer?

If CEO came to you and said, “what will average sales be?” how would you answer?

Page 19: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 19

Thought Exercise II- We Doubled The Samples

If a store manager came to you and said, “what will my sales be?” how would you answer?

If CEO came to you and said, “what will average sales be?” how would you answer?

(mean = 290, sd = 53) (mean = 290, sd = 53)

What do you think of these questions now?

Page 20: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 20

Sampling Distribution–Many times we are sampling a population and need to find the true mean

The mean of the sample is denoted by

estimates the true mean, µ

Is it a ‘good’ estimator?

It depends on a few things The standard deviation of the population The sample size The distribution of the population (sometimes) A good random sample and maybe a little luck

X

X

Page 21: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 21

Sampling Distribution

is approximately normally distributed with a mean of µ and st dev of

Since we never know the actual σ, we approximate it with the sample standard deviation, s.

X

n

is commonly used in statistics

We call this term the standard error of the mean

X

ss

n

Let’s see how this applies to our examples

Page 22: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 22

Central Limit Theorem– General Idea

is approximately normally distributed with a mean of µ and st dev of

In other words, as you take various samples, the collection of these samples will be approximately normally distributed The larger the value of n, the closer to normally distributed

The population data does not have to be normally distributed

X

n

Page 23: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 23

We Have 3 Measures for a Sample of Data

Mean (average)

Standard Deviation (sample standard deviation)

Standard Error of the Mean

Let’s build a confidence interval….

Page 24: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 24

The t-distribution

The t-distribution resembles a standard normal but with thicker ‘tails’

t-distributions are characterized by a feature called degrees of freedom

t-distributions with higher degrees of freedom more closely represent the standard normal

Page 25: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 25

t-distributions with various Degrees of Freedom

Page 26: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 26

Excel: The t-distribution

The TDIST function requires three inputs

X (the function finds the area to the right of X)

Deg_freedom Tails (inputting 1 tail finds the area to

the right of X, 2 tails reports twice the area)

X must be a positive number

Page 27: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 27

Excel: The inverse t-distribution

The TINV function requires two inputs

Probability Deg_freedom

The function reports the value, t, that will yield the required probability to its right for a t-dist with the specified d.f.

Page 28: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 28

Sampling Distribution

is approximately normally distributed with a mean of µ and st dev of

Since we never know the actual σ, we approximate it with the sample standard deviation, s.

follows a t-distribution with n-1 d.f.

X

n

/X

ts n

Page 29: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 29

Notation

is commonly used in statistics

We call this term the standard error of the mean

X

ss

n

Page 30: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 30

Interval Estimates

Our estimate of the true mean sales per store is 290.5

The standard error of the mean is 8.8

What proportion of samples like ours would be within 10 units of the true mean?

We can use the t-distribution to find out

Page 31: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 31

The Computations

/X

ts n

X

ss

n

𝑃𝑟𝑜𝑏 (−10≤ 𝑥−𝜇≤10 )

𝑃𝑟𝑜𝑏 (−10/𝑆𝑥≤(𝑥−𝜇)/𝑆𝑥≤10 /𝑆𝑥 )𝑃𝑟𝑜𝑏 (−10/𝑆𝑥≤𝑡≤10 /𝑆𝑥 )

𝑃𝑟𝑜𝑏 (−10 /8.8≤ 𝑡≤10/8.8 )

𝑃𝑟𝑜𝑏 (−1.13≤ 𝑡≤1.13 )

Area between -1.13 and 1.13

Page 32: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 32

Where does this fall on t-distribution?

0

Degrees of F: 35

-1.13 1.13

Not to scale

Page 33: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 33

Let’s Do This in Excel

Find the probability of +/- 10 units

Page 34: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 34

Confidence

In this example, we say that we are 73% confident that the true mean lies within 10 units of our estimate.

We must use the word confidence instead of probability as the randomness is associated with our estimator and not the true mean which is not random at all.

Usually, we work backwards from a desired level of confidence and then find the range of the interval necessary to achieve that level.

Page 35: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 35

/2, 1n XX t S

95% Confidence Intervals

A 95% confidence interval takes on the form:

where is the value needed to generate an area of α/2 in each tail of a t-distribution with n-1 degrees of freedom

Use the Excel formula CONFIDENCE.T for

CONFIDENCE.T uses the following: Alpha = 1 – Confidence you want Std Dev = Std Deviation (not the std error of the mean) Sample= sample size

/2, 1n XX t S

/2, 1nt

Page 36: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 36

Test With Sample Data

Divide into groups

Work on one of the data sets

Find the Mean, Std Dev, Std Error of the Mean, and the 95% Confidence Intervals

Page 37: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 37

Hypothesis Testing

Source for Hypothesis Testing: Dr Nicola Ward Petty and CreativeHeuresitcs

Page 38: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 38

Hypothesis Testing

Source for Hypothesis Testing: Dr Nicola Ward Petty and CreativeHeuresitcs

We can say things about a population from a sample taken from the population

Page 39: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 39

Steps of Hypotheses Testing

Hypotheses

Significance

Sample

P-value

Decide

Page 40: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 40

Hypothesis Testing: Step 1: The Hypothesis

We are testing something about the underlying population parameters

Null includes the equality sign (=, ≥, or ≤)

H0- Null Hypothesis (everything else or the status quo)

Ha- Alternative Hypothesis (what you want to prove)

Page 41: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 41

Test Marketing (Formally)

m : average sales per week.

Ho: m is equal to or smaller than 275.

Ha: m is greater than 275.

Page 42: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 42

Hypothesis Testing, Step 2: Significance

Significance, or alpha (α), is generally set to 5%

It is the probability that the Null is rejected when it is really correct, Or a Type I Error

Page 43: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 43

Hypothesis Testing: Step 3: Sample

Take a sample and gather the statistics about the sample (like the mean, std dev, std error of the mean, etc)

Page 44: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 44

Hypothesis Testing, Step 4: P-Value

Different ways to calculate p-value if we are testing one mean or two

One mean: Will the new packaging have sales greater than 275?

Two means: Is the Blue Package better than the Green Package?

We will start with one mean.

To start, we calculate the test statistic:

The value for μ is the value in our Null hypothesis (we are testing to see if this is true population value)

/X

ts n

Page 45: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 45

Hypothesis Testing: P-Value:Example with Packaging

Page 46: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book

Let’s Not Lose Track of the Intuition…

Is 290 larger than 275? What if sales had to be more than 400, more than 500, more than 320,

would you be comfortable about our hypothesis?

How much larger is 290 than 275 relative to the statistics we have calculated? Hint– think about the standard deviation and the standard error of the

mean

How do you feel about our test?

46

Page 47: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 47

Hypothesis Testing: P-Value:

290.54

St. Dev = 8.8475

275

If 275 is the true mean (our Null Hypothesis), what is the chance we drew a sample with an average of 290.54?

Page 48: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 48

m : average sales per store

Ho: m is less than or equal to 275.

Ha: m is greater than 275.

Hypothesis Testing: P-Value:Formal Statement Of Problem

Page 49: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 49

Hypothesis Testing: P-Value:Computations

Test Statistic =

Case: When Null is ≤ and the sample mean is higher than the null value:

P equals (1-T.DIST) Function or the T.DIST.RT Function

Let’s test in Excel

=1.76

Page 50: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 50

Hypothesis Testing Step 5: DecideHow to Use the P-Value

Significance

If p < Significance Level, Reject the Null

If p > Significance Level, Do Not Reject the Null

Page 51: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 51

Hypothesis Testing: Decide:How to Use the P-Value

Low p-value (e.g. 4.4%) means reject the null.

1 minus the p-value is maximum confidence on the alternative hypothesis.

Average Weekly Sales will exceed 275

Page 52: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 52

Sales Distribution– How far away is 290 if the real mean is 275?

0 1.7575

Area = 4.4%

Not Drawn to Scale

Ho: m is less than or = 275.

Ha: m is greater than 275.

Page 53: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 53

Sales Distribution– How far away is 290 if the real mean is 285?

0 0.6278

Area = 26.8%

Not Drawn to Scale

Ho: m is less than = 285.

Ha: m is greater than 285.

Page 54: Statistics Quick Overview

Copyright by Michael S. Watson, 2012; Slides from Managerial Statistics book 54

Sales Distribution– How far away is 290 if the real mean is 265?

0 2.89

Area = 0.3%

Not Drawn to Scale

Ho: m is less than = 265.

Ha: m is greater than 265.