normal distribution and sampling dr. burton graduate school approach to problem solving

81
Normal Distribution And Sampling Dr. Burton

Post on 19-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Normal DistributionAnd Sampling

Dr. Burton

Page 2: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Graduate school approach to problem solving.

Page 3: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4

z

Page 4: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4

z

Page 5: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4z

Page 6: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4z

Page 7: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4z

Page 8: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4z

Page 9: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Progression of a histogram into a continuous distribution

-4 -3 -2 -1 0 1 2 3 4z

0.4

0.3

0.2

0.1

0.0

Page 10: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Area under the curve

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

= 50%

50%

Page 11: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to z scores

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

34.1%

0 to -1

34.1%

0 to +1

Page 12: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to z scores

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

68.2%

-1 to -2 +1 to +2

13.6% 13.6%

Page 13: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Measures of Central Tendency

• Mean = xi / n = the sum values observed divided by the number of observations

• Median = The middle value of all observations collected = 50th percentile

• Mode = the most frequently occurring observation of values measured

• In a normal (gaussian) distribution all the measures are the same

Page 14: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Central limit theorem

• In reasonably large samples (25 or more) the distribution of the means of many samples is normal even though the data in individual samples may have skewness, kurtosis or unevenness.

• Therefore, a t-test may be computed on almost any set of continuous data, if the observations can be considered random and the sample size is reasonably large.

Page 15: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to z scores

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

68.2% 13.6% 13.6%95.4%

Page 16: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to z scores

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

95.4%2.1% 2.1%

-2 to -3 +2 to +3

Page 17: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to z scores

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

99.6%

Page 18: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to +z scores (one tailed tests)

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

84.1%

Acceptance area

Critical area =15.9%

Page 19: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to +z scores (one tailed tests)

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

97.7%

Acceptance area

Critical area =2.3%

Page 20: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Areas under the curve relating to +z scores (one tailed tests)

-4 -3 -2 -1 0 1 2 3 4

0.4

0.3

0.2

0.1

0.0

99.8%

Acceptance area

Critical area =0.2%

Page 21: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Asymmetric Distributions

-4 -3 -2 -1 0 1 2 3 4

Positively Skewed RightNegatively Skewed Left

Page 22: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Distributions (Kurtosis)

-4 -3 -2 -1 0 1 2 3 4

Flat curve =Higher level of deviation from the mean

High curve =Smaller deviation from the mean

Page 23: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Distributions (Bimodal Curve)

-4 -3 -2 -1 0 1 2 3 4

Page 24: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

-3 -2 - + +2 +3-3 -3 -2 -2 -1-1 00 11 22 33

Z scores

Theoretical normal distribution with standard deviations

Probability [% of area in the tail(s)]Upper tail .1587 .02288 .0013Two-tailed .3173 .0455 .0027

Page 25: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

What is the z score for 0.05 probability? (one-tailed test)1.645

What is the z score for 0.05 probability? (two tailed test) 1.96

What is the z score for 0.01? (one-tail test)2.326

What is the z score for 0.01 probability? (two tailed test)

2.576

Page 26: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

The Relationship Between Z and X

55 70 85 100 115 130 145

-3 -2 -1 0 1 2 3

P(X)<130

x

Z

=100

=15

X=

Z=

Population MeanPopulation Mean

Standard DeviationStandard Deviation

130 – 100 15

2

Page 27: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Central limit theorem

• In reasonably large samples (25 or more) the distribution of the means of many samples is normal even though the data in individual samples may have skewness, kurtosis or unevenness.

• Therefore, a t-test may be computed on almost any set of continuous data, if the observations can be considered random and the sample size is reasonably large.

Page 28: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

MEASURES AND VARIATION

• C.                  Standard Deviation

– 1.                    by far the most widely used measure of variation

– 2.                    the square root of the variance of the observations

– 3.                    computed by:• -          squaring each deviation from the mean• -          adding them up• - dividing their sum by less than the sample size

Page 29: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

MEASURES AND VARIATION

1

2

12

n

xxs

ii

or

s =

1

2

1

n

xxin

i

Page 30: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

xi (xi - x) (xi - x)2

1 -5 25

2 -4 16

4 -2 4

7 +1 1

10 +4 16

12 +6 36

xi = 36 0 (xi - x)2 = 98

N=6 Mean = 6 S 2 = (xi - x)2 /n-1 98/6-1 = 98/5 = 19.6

Variance = 19.6

Standard Deviation = 4.43

Page 31: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Variance ------Standard Deviation

Variance = Standard Deviation2

Variance = Standard Deviation

Page 32: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

(x - x)2

n - 1s =

Student’s t distribution

t =x -

s / n

Standard deviation

Page 33: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Confidence Intervals

• The sample mean is a point estimate of the population mean. With the additional information provided by the standard error of the mean, we can estimate the limits (interval) within which the true population mean probably lies.

Source: Osborn

Page 34: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Confidence Intervals

• This is called the confidence interval which gives a range of values that might reasonably contain the true population mean

• The confidence interval is represented as: a b– with a certain degree of confidence - usually

95% or 99% Source: Osborn

Page 35: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Confidence Intervals• Before calculating the range of the interval, one

must specify the desired probability that the interval will include the unknown population parameter - usually 95% or 99%.

• After determining the values for a and b, probability becomes confidence. The process has generated an interval that either does or does not contain the unknown population parameter; this is a confidence interval.

Source: Osborn

Page 36: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Confidence Intervals

• To calculate the Confidence Interval (CI)

)/( nsXCI

Source: Osborn

Page 37: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Confidence Intervals

• In the formula, is equal to 1.96 or 2.58 (from the standard normal distribution) depending on the level of confidence required:– CI95, = 1.96

– CI99, = 2.58Source: Osborn

Page 38: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

• 68• 72• 76• 85• 87• 90• 93• 94• 94• 95• 97• 98• 103• 105• 105• 107• 114• 117• 118• 119• 123• 124• 127• 151• 159• 217

• 76• 85• 87• 93• 98• 103• 105• 105• 117• 118• 119• 123• 127• 151• 217

Population Data:Sample 1

X

= 114.9Standard deviation = 34.1Standard error of the mean = 8.8

Page 39: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Confidence Intervals• Given a mean of 114.9 and a standard error

of 8.8, the CI95 is calculated:

= 114.9 + 17.248

= 97.7, 132.1Source: Osborn

)8.8(96.19.114

)/(95

nsXCI

Based on this sample it is assumed that 95% of the population valueslie between 97.7 and 132.1

Page 40: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

OUTLINE 2.1 Selecting Appropriate Samples

Explains why the selection of an appropriate sample has an important bearing on the reliability of inferences about a population

 2.2 Why Sample?

Gives a number of reasons sampling is often preferable to census taking 2.3 How Samples are Selected

Explains how samples are selected 2.4 How to Select a Random Sample

Illustrates with a specific example the method of selecting a random sample using a computer statistical package 2.5 Effectiveness of a Random Sample

Demonstrates the credibility of the random sampling process 2.6 Missing and incomplete Data

Explains the problem of missing or incomplete data and offers suggestions on how to minimize this problem 

Page 41: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

LEARNING OBJECTIVES 1.        Distinguish between

a.       populations and samples

b.       parameters and statistics

c.       various methods of sampling

2.        Explain why the method of sampling is important

3.        State why samples are used

4.        Define random sample

5.        Explain why it is important to use random sampling

6.        Select a random sample using a computer statistical program

7.        Suggest methods for dealing with missing data

Page 42: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

SELECTING APPROPRIATE SAMPLES

A. Population – a set of persons (or objects) having a common observable characteristic

B.  Sample – a subset of a population

C.  The WAY a sample is selected is more important than the size of the sample

D.  An appropriate sample should be representative of the population

E.  A set of observations may be summarized by a descriptive statistic called a parameter

Page 43: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

SELECTING APPROPRIATE SAMPLES

F.  Random sample

1.  Every subject has an equal opportunity for being selected

2.  Technique most likely to yield a representative sample

3.  Obstacles

a.  Response rate – how many will respond

b.  Sampling bias – some segment of the population may be over or under represented

c.  May be too costly

Page 44: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

WHY SAMPLE?

A.   Random sampling - Each subject in the population has an equal chance of being selected1.  Avoids known and unknown biases on average2.  Helps convince others that the trial was conducted properly3.  Basis for statistical theory that underlies hypothesis tests and

confidence intervals

B.  Convenience samples1.  selected at will or in a particular program2.  seldom representative of the underlying population3.  used when random samples are virtually impossible to select

Page 45: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

WHY SAMPLE?

C.  Systematic sampling1.  used when a sampling frame – a complete, nonoverlapping list

of the persons or objects constituting the population is available2.  randomly select a first case then proceed by selecting every

case

D. Stratified sampling – used when we wish the sample to represent the various strata (subgroups) of the population proportionately or to increase the precision of the estimate

E.  Cluster sampling1.  select a simple random sample (number of city blocks)2.  More economical than random selection of persons throughout

the city

Page 46: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

HOW TO SELECT A RANDOM SAMPLE

• Random Numbers Table: http://en.wikipedia.org/wiki/Random_number_table

• Computer statistical package: SPSS or

Excel

Page 47: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

EFFECTIVENESS OF A RANDOM SAMPLE

• A.              Reliability is usually demonstrated by

– 1.        defining fairly small population

– 2.        selecting from it all conceivable samples of a particular size

– 3.        mean average is computed

– 4.        the variation for the population is observed

– 5.        a comparison of these sample means (statistics) with the population mean (population) neatly demonstrates the credibility of the sampling

scheme

Page 48: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

MISSING AND INCOMPLETE DATA

A.  Bias may be introduced because of possible differences between respondents and nonrespondents

B.  Limits the ability to accurately draw inferences about the population

C.  Subjects may drop out of the study

D.  Ways to deal with missing data1.  Last observation carry-forward – take the last observed value prior to

dropout and treat them as final data

Page 49: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Understanding and Reducing Errors

• Goals of Data Collection and Analysis– Promoting accuracy and precision– Reducing differential and nondifferential errors– Reducing intraobserver and interobserver variablity

• Accuracy and Usefulness– False-positive and false-negative results– Sensitivity and specificity– Predictive values– Likelihood rations, odds ratios, and cutoff ratios– Receiver operating characteristic (ROC) curves

• Measuring Agreement– Overall percentage agreement– Kappa test ratio

Page 50: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Promoting Precision and Accuracy

• Accuracy: The ability of a measurement to be correct on the average.

• Precision: the ability of a measurement to give the same result or a very similar result with repetition of the test. (reproducibility, reliability)

Page 55: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Differential and nondifferential error

• Bias is a differential error– A nonrandom, systematic, or consistent

error in which the values tend to be inaccurate in a particular direction.

• Nondifferential are random errors

Page 56: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Bias• Three most problematic forms of bias in

medicine:– 1. Selection (Sampling) Bias:

The following are biases that distort results because of the selection process

• Admission rate (Berkson’s) bias– Distortions in risk ratios occur as a result of different

hospital admission rate among cases with the risk factor, cases without the risk factor, and controls with the risk factor –causing greatly different risk-factor probabilities to interfere with the outcome of interest.

• Nonresponse bias– i.e. noncompliance of people who have scheduled

interviews in their home.

• Lead time bias– A time differential between diagnosis and treatment

among sample subjects may result in erroneous attribution of higher survival rates to superior treatment rather than early detection.

Page 57: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Bias• Three most problematic forms of bias in

medicine:– 1. Selection (Sampling) Bias1. Selection (Sampling) Bias

• Admission rate (Berkson’s) biasAdmission rate (Berkson’s) bias• Nonresponse biasNonresponse bias• Lead time biasLead time bias

– 2. Information (misclassification) Bias2. Information (misclassification) Bias• Recall biasRecall bias

– Differentials in memory capabilities of sample subjectsDifferentials in memory capabilities of sample subjects

• Interview biasInterview bias– ““blinding of interviewers to diseased and control blinding of interviewers to diseased and control

subjects is often difficult.subjects is often difficult.

• Unacceptability biasUnacceptability bias– Patients reply with “desirable” answersPatients reply with “desirable” answers

Page 58: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Bias• Three most problematic forms of bias in

medicine:– 1. Selection (Sampling) Bias

• Admission rate (Berkson’s) bias• Nonresponse bias• Lead time bias

– 2. Information (misclassification) Bias• Recall bias• Interview bias• Unacceptability bias

– 3. Confounding3. Confounding• A confounding variable has a relationship with both A confounding variable has a relationship with both

the dependent and independent variables that masks the dependent and independent variables that masks or potentiates the effect of the variable on the study.or potentiates the effect of the variable on the study.

Page 59: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Discrete variables– Nominal variables– Dichotomous (Binary) variables

• Ordinal (Ranked) variables

• Continuous (Dimensional) variables

• Ratio variables

• Risks and Proportions as variables

Page 60: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Nominal variablesNominal variables

Page 61: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Nominal

AA

OOBB

ABAB

Social Security Number

123 45 6789312 65 8432555 44 7777

Page 62: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Nominal variablesNominal variables

• Dichotomous (Binary) variablesDichotomous (Binary) variables

Page 63: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Dichotomous (Binary) Dichotomous (Binary) variablesvariables

WNL

Not WNL

Accept

Reject

Normal

Abnormal

Page 64: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Nominal variables

• Dichotomous (Binary) variables

• Ordinal (Ranked) variablesOrdinal (Ranked) variables

Page 65: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Ordinal (Ranked) variablesOrdinal (Ranked) variablesStrongly agree, agree, neutral, disagree, strongly disagree

1 2 3 4 5 6 7 8

The difference in value betweeneach rank is ignored.

Page 66: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Nominal variables

• Dichotomous (Binary) variables

• Discrete variables

• Ordinal (Ranked) variables

• Continuous (Dimensional) Continuous (Dimensional) variablesvariables

Page 67: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Continuous (Dimensional) Continuous (Dimensional) variablesvariables

Height Blood Pressure Weight

Temperature32° F

Page 68: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Nominal variables

• Dichotomous (Binary) variables

• Discrete variables

• Ordinal (Ranked) variables

• Continuous (Dimensional) variables

• Ratio variablesRatio variables

Page 69: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Ratio variablesRatio variables

• A continuous scale that has a true zero point

Page 70: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Types of Variation

• Nominal variables• Dichotomous (Binary) variables• Discrete variables• Ordinal (Ranked) variables• Continuous (Dimensional) variables• Ratio variables

• Risks and Proportions as variablesRisks and Proportions as variables

Page 71: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Risks and Proportions as Risks and Proportions as variablesvariables

• Variables created by the ratio of discrete counts in the numerator to counts in the denominator.

Page 72: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Table Shell

Title

Box Head

Stub

Cell

Note

Source

What are the data?Who?Where are the data?When?

Captions or column headings

Row captions

“The intersection of a column and a row”

Explanation

References

Page 73: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Charts

• Bar: One or more variables • Grouped Bar: From tables w/two or three variables• Stacked Bar: A total category w/frequencies within• Pie: Percentages• Histograms: Continuous data• Frequency polygons: Continuous data• Line Graphs: Time trends/survival curves• Scatter diagrams: two continuous variables

Page 74: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Bar Chart

Page 75: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Grouped Bar

Page 76: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Stacked Bar

Page 77: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

PIE

1

2

3

4

Page 78: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Histogram

5

4

3

2

1

05.0 6.0 7.0 8.0 9.0

Page 79: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Frequency Polygon

5

4

3

2

1

05.0 6.0 7.0 8.0 9.0

Page 80: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Line Graphs

1997 1998 1999 2000 2001 2002

1100010000 9000 8000 7000 6000

Page 81: Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving

Scatter Diagrams

X

Y

Height

Weight

72717069686766656463626160

100 110 120 130 140 150 160 170 180 190 200 210