a broad overview of key statistical concepts

36
A Broad Overview of Key Statistical Concepts

Upload: fleur-mcconnell

Post on 02-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

A Broad Overview of Key Statistical Concepts. An Overview of Our Review. Populations and samples Parameters and statistics Confidence intervals Hypothesis testing. Populations and Samples. … and Parameters and Statistics. Populations and Parameters. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Broad Overview of Key Statistical Concepts

A Broad Overview of Key Statistical Concepts

Page 2: A Broad Overview of Key Statistical Concepts

An Overview of Our Review

• Populations and samples

• Parameters and statistics

• Confidence intervals

• Hypothesis testing

Page 3: A Broad Overview of Key Statistical Concepts

Populations and Samples

… and Parameters and Statistics

Page 4: A Broad Overview of Key Statistical Concepts

Populations and Parameters

• A population is any large collection of objects or individuals, such as people, students, or trees about which information is desired.

• A parameter is any summary number, like an average or percentage, that describes the entire population.

Page 5: A Broad Overview of Key Statistical Concepts

Parameters

• Examples include population mean , the population variance 2 and population proportion p.

• 99.999999999999….% of the time, we don’t (...or can’t) know the real value of a population parameter.

• Best we can do is estimate the parameter!

Page 6: A Broad Overview of Key Statistical Concepts

Samples and Statistics

• A sample is a representative group drawn from the population.

• A statistic is any summary number, like an average or percentage, that describes the sample.

Page 7: A Broad Overview of Key Statistical Concepts

Statistics

• Examples include the sample mean , and the sample variance s2, and the sample proportion (“p-hat”)

• Because samples are manageable in size, we can determine the value of statistics.

• We use the known statistic to learn about the unknown parameter.

Page 8: A Broad Overview of Key Statistical Concepts

Example: Smoking at PSU?

Population of 42,000 PSU students

What proportion smoke regularly?

Sample of 987 PSU students

43% reported smoking regularly

Page 9: A Broad Overview of Key Statistical Concepts

Example: Grade inflation?

Population of 5 million college

studentsIs the average GPA 2.7?

Sample of 100 college students

How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

Page 10: A Broad Overview of Key Statistical Concepts

Two ways to learn about a population parameter

• Confidence intervals estimate parameters.– We can be 95% confident that the proportion of

Penn State students who have a tattoo is between 5.1% and 15.3%.

• Hypothesis tests test the value of parameters.– There is enough statistical evidence to conclude

that the mean normal body temperature of adults is lower than 98.6 degrees F.

Page 11: A Broad Overview of Key Statistical Concepts

Confidence Intervals

A Review of Concepts

Page 12: A Broad Overview of Key Statistical Concepts

The situation

• Want to estimate the actual population mean .

• But can only get , the sample mean.

• Find a range of values, L < < U, that we can be really confident contains .

• This range of values is called a “confidence interval.”

Page 13: A Broad Overview of Key Statistical Concepts

Confidence Intervals for Proportions in Newspapers

• ABC News Poll, May 16-20, 2001• 69% of 1,027 U.S. adults think using a hand-held

cell phone while driving a car should be illegal• The “margin of error” is 3%.• The “confidence interval” is 69% ± 3%.• We can be really confident that between 66% and

72% of all U.S. adults think using a hand-held cell phone while driving a car should be illegal.

Page 14: A Broad Overview of Key Statistical Concepts

General Form of Most Confidence Intervals

• Sample estimate ± margin of error

• Lower limit L = estimate - margin of error

• Upper limit U = estimate + margin of error

• Then, we’re confident that the population value is somewhere between L and U.

Page 15: A Broad Overview of Key Statistical Concepts

T-interval for Mean

nsx tFormula in notation:

Formula in English:

Sample mean ± (t × estimated standard error)

where “t” comes from the t distribution, and depends on the confidence level and the sample size through the degrees of freedom “n-1”.

Page 16: A Broad Overview of Key Statistical Concepts

Length of Confidence Interval

• Want confidence interval to be as narrow as possible.

• Length = Upper Limit - Lower Limit

Page 17: A Broad Overview of Key Statistical Concepts

How length of CI is affected?

• As sample mean increases…

• As the standard deviation decreases…

• As we decrease the confidence level…

• As we increase sample size …

nsx t

Page 18: A Broad Overview of Key Statistical Concepts

T-Interval for Mean in Minitab

One-Sample T: TEMP

Variable N Mean StDev SE Mean 95.0% CI TEMP 130 98.27 0.778 0.0682 (98.14,98.41)

We can be 95% confident that the average normal body temperature of adults is between 98.1 and 98.4 degrees Fahrenheit.

Page 19: A Broad Overview of Key Statistical Concepts

Hypothesis Testing

A Review of Concepts

Page 20: A Broad Overview of Key Statistical Concepts

General Idea of Hypothesis Testing

• Make an initial assumption.

• Collect evidence (data).

• Based on the available evidence, decide whether or not the initial assumption is reasonable.

Page 21: A Broad Overview of Key Statistical Concepts

Example: Normal Body Temperature

Population of many, many adults

Is average adult body temperature 98.6 degrees? Or is it lower?

Sample of 130 adults

Average body temperature of 130 sampled adults is 98.25 degrees.

Page 22: A Broad Overview of Key Statistical Concepts

Making the Decision

• It is either likely or unlikely that we would collect the evidence we did given the initial assumption.

• (Note: “Likely” or “unlikely” is measured by calculating a probability!)

• If it is likely, then we “do not reject” our initial assumption. There is not enough evidence to do otherwise.

Page 23: A Broad Overview of Key Statistical Concepts

Making the Decision (cont’d)

• If it is unlikely, then:– either our initial assumption is correct and we

experienced an unusual event– or our initial assumption is incorrect

• In statistics, if it is unlikely, we decide to “reject” our initial assumption.

Page 24: A Broad Overview of Key Statistical Concepts

Idea of Hypothesis Testing: Criminal Trial Analogy

• First, state 2 hypotheses, the null hypothesis (“H0”) and the alternative hypothesis (“HA”)

– H0: Defendant is not guilty.

– HA: Defendant is guilty.

Page 25: A Broad Overview of Key Statistical Concepts

Criminal Trial Analogy (continued)

• Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc.

• In statistics, the data are the evidence.

Page 26: A Broad Overview of Key Statistical Concepts

Criminal Trial Analogy(continued)

• Then, make initial assumption.– Defendant is innocent until proven guilty.

• In statistics, we always assume the null hypothesis is true.

Page 27: A Broad Overview of Key Statistical Concepts

Criminal Trial Analogy(continued)

• Then, make a decision based on the available evidence.– If there is sufficient evidence (“beyond a

reasonable doubt”), reject the null hypothesis. (Behave as if defendant is guilty.)

– If there is not enough evidence, do not reject the null hypothesis. (Behave as if defendant is not guilty.)

Page 28: A Broad Overview of Key Statistical Concepts

Very Important Point

• Neither decision entails proving the null hypothesis or the alternative hypothesis.

• We merely state there is enough evidence to behave one way or the other.

• This is also always true in statistics! No matter what decision we make, there is always a chance we made an error.

Page 29: A Broad Overview of Key Statistical Concepts

Errors in Criminal Trials

Truth

JuryDecision

Not guilty Guilty

Not guilty OK ERROR

Guilty ERROR OK

Page 30: A Broad Overview of Key Statistical Concepts

Errors in Hypothesis Testing

Truth

DecisionNull

hypothesisAlternativehypothesis

Do notreject null

OKTYPE IIERROR

Reject nullTYPE IERROR

OK

Page 31: A Broad Overview of Key Statistical Concepts

Definitions: Types of Errors

• Type I error: The null hypothesis is rejected when it is true.

• Type II error: The null hypothesis is not rejected when it is false.

• There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so!

Page 32: A Broad Overview of Key Statistical Concepts

Example: Normal Body Temperature

• Specify hypotheses.– H0: = 98.6 degrees

– HA: < 98.6 degrees

• Make initial assumption: = 98.6 degrees

• Collect data: Average body temp of 130 sampled adults is 98.27 degrees. How likely is it that a sample of 130 adults would have an average body temp as low as 98.27 if the average body temp of population was 98.6?

Page 33: A Broad Overview of Key Statistical Concepts

Using the p-value to make the decision

• The p-value represents how likely we would be to observe such an extreme sample if the null hypothesis were true.

• The p-value is a probability, so it is a number between 0 and 1.

• Close to 0 means “unlikely.”

• So if p-value is “small,” (typically, less than 0.05), then reject the null hypothesis.

Page 34: A Broad Overview of Key Statistical Concepts

Example (continued)

One-Sample T: TEMPTest of mu = 98.6 vs mu < 98.6

Var N Mean StDev SE Mean T PTEMP 130 98.27 0.778 0.0682 -4.79 0.000

The p-value can easily be obtained from statistical software like MINITAB.

(Generally, the p-value is labeled as “P”)

Page 35: A Broad Overview of Key Statistical Concepts

Example (continued)

• The p-value, <0.0001, indicates that, if the average body temperature in the population is 98.6 degrees, it is unlikely that a sample of 130 adults would have an average body temperature as extreme as 98.27 degrees.

• Decision: Reject the null hypothesis.

• Conclude that the average body temperature is lower than 98.6 degrees.

Page 36: A Broad Overview of Key Statistical Concepts

What type of error might we have made?

• Type I error here is claiming that average body temp is lower than 98.6 when in fact it really isn’t.

• Type II error here is failing to claim that the average body temp is lower than 98.6 when it is.

• We rejected the null hypothesis, i.e. claimed body temp is lower than 98.6, so we may have made a Type I error.