1 ph.d. course in biostatisticsday 2 some results about means and variances the sample mean and the...

52
1 Ph.D. COURSE IN BIOSTATISTICS DAY 2 SOME RESULTS ABOUT MEANS AND VARIANCES sample mean and the sample variance were used to describe a cal value and the variation in the sample. ay similarly use the population mean, the expected value, and population variance to describe the typical value and the variat population. e values are often referred to as the theoretical values, and the le mean and the sample variance are considered as estimates he analogous population quantities. epresents a random variable, e.g. birth weight or blood press ean and variance are often denoted 2 m ean () variance () EX Var X notation is also used when the distribution is not normal.

Upload: ami-randall

Post on 30-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

1

Ph.D. COURSE IN BIOSTATISTICS DAY 2

SOME RESULTS ABOUT MEANS AND VARIANCESThe sample mean and the sample variance were used to describe a typical value and the variation in the sample.

We may similarly use the population mean, the expected value, and the population variance to describe the typical value and the variation in a population.

These values are often referred to as the theoretical values, and the sample mean and the sample variance are considered as estimates of the analogous population quantities.

If X represents a random variable, e.g. birth weight or blood pressure, the mean and variance are often denoted

2mean ( ) variance ( )E X Var X

The notation is also used when the distribution is not normal.

2

The random variation in a series of observations is transferred to uncertainty, i.e. sampling error or sampling variation, in estimates computed from the observations. The average, or sample mean, is an important example of such an estimate.

Let denote a random sample of size n from a population with mean and variance then the average is itself a random variable.

If several samples of size n are drawn from the population, the average value will vary between samples.

1 2, , , nX X X 2 X

Terminology: A ”random sample” implies that the observations are mutually independent replicates of the experiment ”take a unit at random from the population and measure the value on this unit”.

For the average (sample mean) we have2

( ) ( )E X Var Xn

3

The sample mean is an unbiased estimate of the population mean. The variance of the sample mean is proportional to the variance of a single observation and inversely proportional to the sample size.

The standard deviation of the sample mean = standard error of the mean = s.e.m. n

Interpretation: The expected value, the variance, and the standard error of the mean are the values of these quantities that one would expect to find if we generated a large sample of averages each obtained from independent random samples of size n from the same population.

The result shows that the precision of the sample mean increases with the sample size.

Moreover, if the variation in the population follows a normal distribution the sampling variation of the average also follows a normal distribution

2( , )X N n

4

Consider a random sample of size n from a population with mean and variance and an independent random sample of size m from a population with mean and variance . For the difference between the sample means we have

1 2, , , nX X X1 2, , , mY Y Y

Y2XX

2Y

2 2

( ) ( ) ( )

( ) ( ) ( )

X Y

X Y

E X Y E X E Y

Var X Y Var X Var Yn m

These results are a consequence of the following general results

• Linear transformations of random variables (change of scale)

• The expected value of a sum of random variables

• The variance of a sum of independent random variables

2( ) ( ) ( ) ( )E a bX a bE X Var a bX b Var X

0 1 1 0 1 1( ) ( ) ( )n n n nE a a X a X a a E X a E X

2 20 1 1 1 1( ) ( ) ( )n n n nVar a a X a X a Var X a Var X

5

For a random sample of size n from a normal distribution the resultabove can be reformulated as

X

n

standard normal distribution

The standard normal distribution is tabulated, so for given values of and this relation can be used to derive probability statementsabout the sample mean.

The sampling distribution of the varianceThe sample variance s2 is also a statistic derived from the observationsand therefore subject to sampling variation. For a random sample froma normal distribution one may show that

2 2( )E s

so the sample variance is an unbiased estimate of the population variance

6The distribution of the sample variance when for various n. 0 1 2 3 4

0

2

4

6

n = 5 n = 10 n = 20 n = 50 n = 100

For a random sample of size n from a normal distribution the sampling error of the sample variance can also be described.We have

22

2( 1)

sn

-distribution with f = n -1 degrees of freedom

The -distributions (chi-square distributions) are tabulated so fora given value of this relation can be used to derive probabilitystatements about the sample variance. A -distribution is the distribution of a sum of independent, squared standard normal variates.

22

2 1

2

7

INTRODUCTION TO STATISTICAL INFERENCE

Statistical inference: The use of a statistical analysis of data to draw conclusions from observations subject to random variation.

Data are considered as a sample from a population (real or hypothetical)The purpose of the statistical analysis is to make statements about certain aspects of this population

The basic components of a statistical analysis• Specification of a relevant statistical model (the scientific problem

is ”translated” to a statistical problem)• Estimation of the population characteristics (the model parameters)• Validation of the underlying assumptions• Test of hypotheses about the model parameters.

A statistical analysis is always based on a statistical model, which formalizes the assumptions made about the sampling procedure and the random and systematic variation in the population from which the sample is drawn.

8

The validity of the conclusions depends on the degree to which the statistical model gives an adequate description of the sampling procedure and the random and systematic variation.

Consequently, checking the appropriateness of the underlying assumptions (i.e. the statistical model) is an important part of a statistical analysis.

The statistical model should be seen as an approximation to the real world.

The choice of a suitable model is always a balance between

complex models, which are close approximations, but very difficult to use in practice,

andsimple models, which are crude approximations, but easy to apply

9

Example: Comparing the efficacy of two treatments

Design: Experimental units (e.g. patients) are allocated to two treatments. For each experimental unit in both treatment groups an outcome is measured. The outcome reflects the efficacy of the treatment.

Purpose: To compare the efficacy of the two treatments

Analysis: To summarize the results the average outcome is computed in each group and the two averages are compared.

Possible explanations for a discrepancy between the average outcome in the two groups

• The treatments have different efficacy. One is better than the other

• Random variation

• Bias originating from other differences between the groups. Other factors which influence the outcome may differ between the groups and lead to apparent differences between the efficacy of the twotreatments (confounding).

10

A proper design of the study (randomization, blinding etc.) can eliminate or reduce the bias and therefore make this explanation unlikely.

Bias correction (control of confounding) is also possible in the statistical analysis.

The statistical analysis is performed to estimate the size of the treatmentdifference and evaluate if random variation is a plausible explanation for this difference.

If the study is well-designed and the statistical analysis indicates that random variation is not a plausible explanation for the difference, we may conclude that a real difference between the efficacy of the two treatments is the most likely explanation of the findings.

The statistical analysis can also identify a range of plausible values, a so-called confidence interval, for the difference in efficacy.

11

STATISTICAL ANALYSIS OF A SAMPLE FROM A NORMAL DISTRIBUTION

Example. Fish oil supplement and blood pressure in pregnant women

Purpose: To evaluate the effect of fish oil supplement on diastolic blood pressure in pregnant women.

Design: Randomised controlled clinical trial on 430 pregnant women, enrolled at week 30 and randomised to either fishoil supplement or control.

Data: Diastolic and systolic blood pressure at week 30 and 37 (source: Sjurdur Olsen)

The Stata file fishoil.dta contains the following variables

grp treatment group, 1 for control, 2 for fish oilgroup a string variable with the name of the group allocationdifsys increase in systolic blood pressure from week 30 to week 37difdia increase in diatolic blood pressure from week 30 to week 37

12

0

.02

.04

.06

-40 -20 0 20 40 -40 -20 0 20 40

Control Fishoil

De

nsi

ty

difdiaGraphs by group

We shall here consider the change in diastolic blood pressure from week 30 to week 37. Stata

histogram difdia , by(group)

13

-40

-20

0

20

40

difd

ia

-20 -10 0 10 20

Inverse Normal

Control

-40

-20

0

20

40

difd

ia

-20 -10 0 10 20

Inverse Normal

Fish oil

Stataqnorm difdia if grp==1, title("Control") ///

saving(q1,replace)qnorm difdia if grp==2, title("Fish oil") ///

saving(q2,replace)graph combine q1.gph q2.gph

14

For both groups the histogram and the probability plot correspond closely to the expected behavior of normally distributed data.

Hence our statistical model is:

The observations in each group can be considered as a random sample from a normal distribution with unknown parameters as below:

The two sets of observations are independent.

21 1

22 2

Group Mean Variance Control

Fishoil

The ultimate goal of the analysis is to compare the two treatments with respect to the expected change in blood pressure. We shall return to this analysis later.

First we want to examine the change in the diastolic blood pressure in women in the control group.

15

We now consider the control group and focus on the increase in diastolic blood pressure

Statistical modelThe data are considered as a random sample of size 213 from a normal distribution with mean and variance .

The parameter describes the expected change and the parameter describes the random variation caused by biological factors and measurement errors.

2

2

Problem: Do the data suggest that the diastolic blood pressure in pregnant women increases from week 30 to week 37?

DataThe observed values of the change in diastolic blood pressure in the 213 women who participated in the study

16

AssumptionsThe assumptions of the statistical model are

1. The observations are independent

2. The observations have the same mean and the same variance

3. A normal distribution describes the variation.

Checking the validity of the assumptions is usually done by various plots and diagrams. Knowledge of the measurement process can often help in identifying points which need special attention.

Re 1. Checking independence often involves going through the sampling procedure. Here the assumption would e.g. be violated if the same woman contributes with more than one pregnancy.

Re 2. Do we have “independent replications of the same experiment”? Factors that are known to be associated with changes in blood pressure are not accounted for in the model. They contribute to the random variation.

17

Re 3. The plots above indicate that a normal distribution gives an adequate description of the data

EstimationThe estimation problem: Find the normal distribution that best fits the data.

Solution: Use the normal distribution with mean equal to the sample mean and variance equal to the sample variance.

sum difdia if grp==1Variable | Obs Mean Std. Dev. Min Max---------+----------------------------------------------------- difdia | 213 1.901408 7.528853 -28 29

i.e.

Note: A normal distribution is completely determined by the values of the mean and the variance.

Convenient notation: A “^” on top of a population parameter is used to identify the estimate of the parameter

2 2ˆ ˆ1.90 56.68 ( 7.53)x s s

18

Question: Do the data suggest a systematic change in the diastolic pressure? No systematic change means that the expected change is 0, i.e.Hypothesis: The data are consistent with the value of being 0. This hypothesis is usually written as

We have observed an average value of 1.90. Is sampling variation a possible explanation for the difference between the observed value of 1.90 and the expected value of 0?

: 0H

Statistical testTo evaluate if the random variation can account for the difference we assume that the hypothesis is true and compute the probability that the average value in a random sample of size 213 differs by at least as much as the observed value.

From the model assumptions we conclude that the average can be considered as an observation from a normal distribution with mean 0 and standard deviation equal to 213n

19

Consequently, the standardized value

is an observation from a standard normal distribution.

0

213

xz

Problem: The population standard deviation is unknown, but in large samples we may use the sample standard deviation and still rely on the normal distribution. Small samples are considered later.

Replacing with the estimate s we therefore get0 1.901 0 1.901

" " 3.690.516213 7.529 213

xz

s

For a normal distribution a value more than 3 standard deviations from the mean is very unlikely to occur.

Using a table of the standard normal distribution function we find thata value that deviates more 3.69 in either direction occurs with a probability of 0.00023.

20

p-valueThe probability computed above is called the p-value.

p-value = the probability of obtaining a value of the test statistics asleast as extreme as the one actually observed if the hypothesis is true.

Usually extreme values in both tails of the distribution are included (two-sided test), so in the present case

( 3.69) ( 3.69) 2 0.000114 0.00023p value P z P z

-4 -3 -2 -1 0 1 2 3 40.0

0.1

0.2

0.3

0.4

The calculation indicates that sampling variation is a highly implausible explanation for the observed change in blood pressure. The observed deviation from the hypothesized value is statistically significant. Usually a hypothesis is rejected if the p-value less than 0.05.

3.69-3.69

21

SMALL SAMPLES – use of the t – distribution

The t-distribution has been tabulated, so we are still able to computea p-value. Note that the t-distributions does not depend on the parameters and so the same table applies in all situations.

As the sample size increases the t-distribution will approach a standardnormal distribution. Usually the approximation is acceptable for sampleslarger than 60, say.

2

To compute the p-value above we replaced the unknown population standard deviation with the sample standard deviation and referred the value of the test statistic to a normal distribution.

For large samples this approach is unproblematic, but for small samples the p-value becomes too small, since the sampling error of the sample standard deviation is ignored. Statistical theory shows that the correctdistribution of the test statistic is a so-called t-distribution with f = n – 1 degrees of freedom.

22

-4 -2 0 2 40.0

0.1

0.2

0.3

0.4

standard normal t-dist. n = 5 t-dist. n = 20 t-dist. n = 100

If we again compute 0

3.69213

xts

but this time look-up the value in a table of a t-distribution with f = 212 degrees of freedom, we get p = 0.00029. Since the sample is relativelylarge the result is almost identical to the one above.

A comparison of t-distribution with 4, 19, and 99 degrees of freedomand a standard normal distribution (the black curve!)

23

STATA: PROBABILITY CALCULATIONS

Some examplesdisplay norm(3.6858) returns .99988601, the value of the cumulative probability function of a standard normal distribution at3.6858, i.e. the probability that a standard normal variate is less than or equal to 3.6858

display ttail(212,3.6858) returns .00014478, the probability that a t-statistic with 212 degrees of freedom is larger than 3.6858.

display Binomial(224,130,0.5134) returns .02608126, the probability of getting at least 130 successes from a Binomial distribution with n = 224 and p = 0.5134.

( 3.6858),P Z

Output from statistical programs like Stata usually also includes p-values so statistical tables are rarely needed. Moreover Statahas a lot of build-in functions that can compute almost any kind of probabilities. Write help probfun to see the full list.

24

ONE SAMPLE t-TEST: THE GENERAL CASE

Above we derived the t-test of the hypothesis

The same approach can be used to test if any specified value isconsistent with the data. If we e.g. want to test the hypothesis we compute

: 0H

: 2H

2 1.901 2 0.0990.1911

0.516213 7.529 213

xts

display 2*ttail(212,0.1911) returns the p-value .84863014, so an expected change of 2 is compatible with the data and can not be rejected.

Note: The function ttail gives a probability in the upper tail of thedistribution. A negative t-value should therefore be replaced by the corresponding positive value when computing the p-value.

25

CONFIDENCE INTERVALS

In the example the observed average change in blood pressure is 1.901, and this value was used as an estimate of the expected change

Values close to 1.901 are also compatible with the data, we saw e.g.that the value 2 could not be rejected.

Problem: Find the range of values for the expected change that is supported by the data.

A confidence interval is the solution to this problem.

Formally: A 95% confidence interval identifies the values of the unknown parameter which would not be significantly contradicted by a (two-sided) test at the 5% level, because the p-value associated with the test statistic for each of these values is larger than 5%

26

Frequency interpretation: If the experiment is repeated a large number of times and a 95% confidence interval is computed for eachreplication, then 95% of these confidence intervals will contain thetrue value of the unknown parameter.

How to calculate the 95% confidence intervalThe limits of the confidence interval are the values of t equal to the2.5 and 97.5 percentile of a t-distribution with n – 1 degrees of freedom.The t-distribution is symmetric around 0, so and the the confidence limits are therefore given by the values of satisfying

0.025 0.975t t

0.975

xt

s n

i.e. 0.975 0.975

s sx t x t

n n

The formula shows the confidence intervals becomes more narrowas the sample size increases.

27

Example continued

0.975

0.881.9014 1.9712 0.5159

2.92

sx t

n

0.88 2.92 and the 95% confidence interval becomes

In Stata the command invttail gives the upper percentiles anddisplay invttail(212,0.025) returns 1.971217.

The 95% confidence limits for the expected change in diastolic blood pressure therefore becomes

99% confidence intervals are derived from the upper 0.5 percentile ina similar way.

Also, one-sided confidence intervals can be defined and computedfrom one-sided statistical test (statistical tests are called one-sided if large deviations in only one direction are considered extreme).

28

STATA: ONE SAMPLE t-TEST

A single command in Stata will give all the results derived so far.

ttest difdia=0 if grp==1

One-sample t test---------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]---------+----------------------------------------------------------- difdia | 213 1.901408 .5158685 7.528853 .8845197 2.918297--------------------------------------------------------------------- t = 3.6858 degrees of freedom = 212

Ho: mean(difdia) = 0 Ha: mean < 0 Ha: mean != 0 Ha: mean > 0Pr(T < t) = 0.9999 pr(|T| > |t|) = 0.0003 Pr(T > t) = 0.0001

To test the hypothesis use ttest difdia=2 if grp==1instead

: 2H

(two-sided) p-value

hypothesis tested

29

Statistical inference about the variance

So far we have looked at statistical inference about the mean of a normal population based on a random sample.

In the same setting we can also derive a test statistic for hypothesesabout the variance (or the standard deviation) and obtain confidenceintervals for this parameter. The arguments are based on the resultabout the sampling distribution of the sample variance (see p. 6)

22

2( 1)

sn

-distribution with f = n -1 degrees of freedom

Example continuedSuppose we for some reason want to see if the change in diastolic bloodpressure has a standard deviation of 7, or equivalently a variance of 49.

Inference problems involving a hypothesis about the variance aremuch less common, but may e.g. arise in studies of methods of measurement

30

To test the hypothesis we could compute : 7H 2 56.68

(213 1) (213 1) 245.2449 49

s

and see if this value is extreme when referred to a -distributionon 212 degrees of freedom.

2

Using Stata’s probability calculator, display chi2(212,245.24), we get .94165889.This is the probability of a values at less than or equal to 245.24. The probability of getting a value larger than 245.24 is 1-.94165889 = .05834111. Stata can also give this result directly from the command display chi2tail(212,245.24).

The p-value is 2 times the smallest tail probability, i.e. 0.117. A standard deviation of 7 can not be rejected.

Rule: If the test statistic, x, is smaller than the degrees of freedom, f, use display 2*chi2(x,f) else use display 2*chi2tail(x,f)

31

Confidence intervals for variances and standard deviations

A 95% confidence interval for the population variance is given by 2

2 22

2 20.975 0.025

f s f s

where f is the degrees of freedom and and are the 2.5and the 97.5 percentiles of a -distribution with f degrees of freedom.

A 95% confidence interval for the standard deviation therefore becomes

20.025 2

0.9752

2 20.975 0.025

f fs s

Example – diastolic blood pressure continuedStata’s probability calculator has a function invchi2 that computespercentiles of -distributions. We find that

display invchi2(212,0.025) gives 173.5682display invchi2(212,0.975) gives 254.2178

2

32

A 95% confidence interval for the standard deviation is therefore

212 2127.5289 7.5289 6.88 8.32

254.2178 173.56823

More StataA test of a hypothesis about the standard deviation is carried outby the command

sdtest difdia=7 if grp==1One-sample test of variance----------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]---------+------------------------------------------------------------ difdia | 213 1.901408 .5158685 7.528853 .8845197 2.918297---------------------------------------------------------------------- sd = sd(difdia) c = chi2 = 245.2435 Ho: sd = 7 degrees of freedom = 212 Ha: sd < 7 Ha: sd != 7 Ha: sd > 7 Pr(C < c) = 0.9417 2*(C > c) = 0.1166 Pr(C > c) = 0.0583

Note that the 95% confidence interval is the confidence interval for the population mean and not for the standard deviation.

hypothesized value

(two-sided) p-value

33

STATISTICAL ANALYSIS OF TWO INDEPENDENT SAMPLES FROM NORMAL DISTRIBUTIONS

Example. Fish oil supplement and blood pressure in pregnant women

The study was a randomized trial carried out to evaluate the effect of fishoil supplement on diastolic blood pressure in pregnant women.

Pregnant women were assigned at random to one of two treatment groups. One group received fish oil supplement, the other was acontrol group.

Here we shall compare the two treatments using difdia, the change in diastolic blood pressure, as outcome, or response.

We have already seen histograms and Q-Q plots of the distribution of difdia in each of the two groups (see p. 12-13) and these plotssuggest that the random variation may be adequately described by normal distributions.

34

The observations in each group can be considered as a random sample from a normal distribution with unknown parameters as below:

The two sets of observations are independent.

21

22

Group Mean Variance Control

Fishoil

The standard analysis of this problem is based on the followingstatistical model

Note that the size of the random variation is assumed to be the samein the two groups, so this assumption should also be checked.

The purpose of the analysis is to quantify the difference between the expected change in the two groups and assess if this difference is statistically different from 0

35

Note: The model assumptions imply that if this difference is not statistically different from 0 we may conclude that the distributions arenot significantly different, since a normal distribution is completelydetermined by the parameters and . 2

Model assumptions

1. Independence within and between samples

2. Random samples from population with the same variance

3. The random variation in each population can be described by anormal distribution

Re 1. Inspect the design and the data. Repeated observations on the same individual usually imply violation of the independence assumption.

Re 2. A formal test of the hypothesis of identical variances of normaldistribution is described below.

Re 3. Histograms and Q-Q plots, see page 12-13

36

EstimationBasic idea: population values are estimated by the correspondingsample values. This gives two estimates of the variance, which shouldbe pooled to a single estimate.

Stata performs the basic calculations withbysort grp: summarize difdia

_________________________________________________________________-> grp = controlVariable | Obs Mean Std. Dev. Min Max---------+------------------------------------------------------ difdia | 213 1.901408 7.528853 -28 29_________________________________________________________________-> grp = fish oilVariable | Obs Mean Std. Dev. Min Max---------+------------------------------------------------------- difdia | 217 2.193548 8.364904 -28 31

i.e. control group: mean = 1.90 fish oil group: mean = 2.19

37

Stata does not include this estimate in the output above, but the result is produced by the commands

quietly regress difdia grpdisplay e(rmse)

giving the output 7.9617662, i.e. sp = 7.962

Statistical theory shows that the best approach is to compute a pooled estimate of the variance as a weighted average of the sample variances and use the corresponding standard deviation as the pooled estimate. The weighted average uses weights proportional to the degrees of freedom, i.e. f = n – 1. Hence

2 22 2 1 1 2 2

1 2pooled p

f s f ss s

f f

and

2pooled p ps s s

writing quietly in frontsuppresses output from the command

the string variable groupcan not be used here

The standard deviations are rather similar, so let us assume for a moment that it is reasonable to derive a pooled estimate. Howshould this be done?

38

Statistical test comparing means of two independent samplesThe expected change in diastolic blood pressure is slightly higher in the fish oil group. Does this reflect a systematic effect?

To see if random variation can explain the difference we test the hypothesis

1 2:H

of identical population means in the two samples.

The line of argument is similar to the one that was used in the one-sample case. Assume that the hypothesis is true. This observeddifference between the two means must then be caused by samplingvariation.

The plausibility of this explanation is assessed by computing a p-value,the probability of obtaining a result at least as extreme as the observed.

39

From the model assumption we conclude that if the hypothesis is truethen the difference between the sample means can be considered asan observation from a normal distribution with mean 0 and variance

2 2

21 2

1 2 1 2

1 1Var X X

n n n n

Consequently, the standardized value

1 2

1 2

1 1

n n

x x

is an observation from a standard normal distribution. If the standarddeviation is replace by the pooled estimate we arrive at thetest statistic

1 2

1 2

1 1pn n

x xt

s

ps

40

To derive the p-value this test statistic should be referred to a t-distribution with degrees of freedom, since we may show that sampling distribution of the pooledvariance estimate is identical to the sampling distribution of a variance estimate with degrees of freedom (see page 6).

1 2 1 2 1 2( 1) ( 1) 2f f n n n n

1 2f f

We get

and the p-value becomes 0.70. The difference is not statistically significant different from 0.

1 2

1 2

1 1 1 1

213 217

1.9014 2.19350.38

7.9618pn n

x xt

s

-4 -3 -2 -1 0 1 2 3 40.0

0.1

0.2

0.3

0.4

41

Confidence intervals for the parameters of the model

The model has unknown three parameters and . A 95% confidence interval for the expected value becomes 1

1 0.975 1 1 0.975

1 1

p ps sx t x t

n n

and similarly for . Note that the pooled standard deviation is usedand is therefore the 97.5 percentile of a t-distribution with degrees of freedom. For the change in diastolic blood pressure we get

2.975t 1 2f f

1 1

7.962 7.9621.901 1.966 1.901 1.966 0.83 2.97

213 213

Note: some programs, e.g. Stata, use the separate sample standarddeviation when computing these confidence intervals.

A 95% confidence interval for the standard deviation is based on the pooled estimate with 212 + 216 = 428 degrees of freedom (see page 31)

428 4287.962 7.962 7.46 8.53

487.21 372.57

1 2, , 2

42

Confidence intervals for the difference between means

In a two-sample problem the parameter of interest is usually the difference between the expected values. From the results above(page 39) we get

1 2 ,

1 2 1 2

1 2 0.975 1 2 0.975

1 1 1 1p pn n n n

x x t s x x t s

where the t-percentile refers to a t-distribution with degrees of freedom.

1 2f f

The example

1 1

213 217

1 1

213 217

1.901 2.194 1.966 7.962

1.901 2.194 1.966 7.962

1.80 1.22

43

Two-sample t test with equal variances--------------------------------------------------------------------- Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]---------+----------------------------------------------------------- control | 213 1.901408 .5158685 7.528853 .8845197 2.918297fish oil | 217 2.193548 .5678467 8.364904 1.074318 3.312778---------+-----------------------------------------------------------combined | 430 2.048837 .3835675 7.953826 1.294932 2.802743---------+----------------------------------------------------------- diff | -.2921399 .7679341 -1.801531 1.217252--------------------------------------------------------------------- diff = mean(control) - mean(fish oil) t = -0.3804 Ho: diff = 0 degrees of freedom = 428

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3519 Pr(|T| > |t|) = 0.7038 Pr(T > t) = 0.6481

STATA: TWO SAMPLE t-TEST (equal variances)

A single command in Stata gives all the results derived so far exceptand estimate of the pooled variance (see page 37)

ttest difdia , by(grp)

hypothesis tested

(two sided) p-value

s.d. in combined samples,not pooled s.d.

44

Comparing the variances: The F-distribution

In the statistical model we assumed the same variance in the two populations. To assess this assumption we consider a statistical test of the hypothesis

2 21 2:H

An obvious test statistic is the ratio of sample variances2122

sF

s

A value close to 1 is expected if the hypothesis is true. Both small and large values would suggest that the variances differ.

From statistical theory follows that the distribution of the ratio of twoindependent variance estimates is a so-called F-distribution if the corresponding population variances are identical (i.e. if H is true).

The F-distribution is characterized by a pair of degrees of freedom (the degrees of freedom for the two variance estimates). Like normal, t-, and chi-square distributions the F-distributions are extensively tabulated.

45

Comparing the variances

In practice the hypothesis of equal variances is tested by computing2 21 22 21 2

( , ) largest variance estimate

( , ) smallest variance estimateobs

max s sF

min s s

and p-value is then obtained as where the pair of degrees of freedom are those of the numerator and the denominator.

2 obsp P F F

ExampleFor the change in diastolic blood pressure we have

69.971.2344

56.68obsF

Stata’s command display 2*Ftail(216,212,1.2344) returns 0.125, so the p-value becomes 0.125.

The difference between the two standard deviations is not statisticallysignificant.

46

Variance ratio test------------------------------------------------------------------- Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]---------+--------------------------------------------------------- control | 213 1.901408 .5158685 7.528853 .8845197 2.918297fish oil | 217 2.193548 .5678467 8.364904 1.074318 3.312778---------+---------------------------------------------------------combined | 430 2.048837 .3835675 7.953826 1.294932 2.802743------------------------------------------------------------------- ratio = sd(control) / sd(fish oil) f = 0.8101Ho: ratio = 1 degrees of freedom = 212, 216

Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1Pr(F < f) = 0.0622 2*Pr(F < f) = 0.1245 Pr(F > f) = 0.9378

(two sided) p-value

hypothesis tested

STATA: COMPARISON OF TWO VARIANCES

Stata’s command sdtest can also be used to compare two variances.Write

sdtest difdia , by(grp)

47

Comparing the means when variances are unequal

Problem: What if the assumption of equal variances is unreasonable?

Some solutions:1. Try to obtain homogeneity of variances by transforming the

observations in a suitable way, e.g. by working with log-transformed data.

1 2

1 2

2 21 2

approxs s

n n

x xt

Under the hypothesis of equal means the distribution of this test statistic is approximately equal to a t-distribution. To compute the degrees of freedom for the aproximate t-distribution first compute

2. Use an approximate t-test, that does not rely on equal variances. The approximate t-test has the form

48

21 1

2 21 1 2 2

s ncs n s n

the degrees of freedom is then obtained as

12 2

1 2

(1 )

1 1approx

c cf

n n

3. Use a non-parametric test, e.g. a Wilcoxon-Mann-Whitney test.

We shall consider solution 1 next time and solution 3 later in thecourse. The Stata command ttest computes solution 2 if the optionunequal is added.

Note: When the variances of the two normal distributions differ the hypothesis of equal means are no longer equivalent to the hypothesis of equal distributions.

49

Two-sample t test with unequal variances-------------------------------------------------------------------- Group | Obs Mea Std. Err Std. Dev. [95% Conf. Interval]---------+---------------------------------------------------------- control | 213 1.901408 .5158685 7.528853 .8845197 2.918297fish oil | 217 2.193548 .5678467 8.364904 1.074318 3.312778---------+----------------------------------------------------------combined | 430 2.048837 .3835675 7.953826 1.294932 2.802743---------+---------------------------------------------------------- diff | -.2921399 .7671833 -1.800088 1.215808-------------------------------------------------------------------- diff = mean(control) - mean(fish oil) t = -0.3808Ho: diff = 0 Satterthwaite's degrees of freedom = 424.831

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3518 Pr(|T| > |t|) = 0.7035 Pr(T > t) = 0.6482

STATA: TWO SAMPLE t-TEST (unequal variances)

To compute the approximate t-test (solution 2 above) with Stata write

ttest difdia , by(grp) unequal

(two sided) p-valueapproximate t-test

degrees of freedom ofthe approximate t-test

approximateconfidence limits

50

SOME GENERAL COMMENTS ON STATISTICAL TESTS

To test a hypothesis we compute a test statistic, which follows a known distribution if the hypothesis is true. We can therefore computethe probability of obtaining a value of the test statistic as least as extreme as the one observed. This probability is called the p-value.

The p-value describes the degrees of support of the hypothesis foundin the data. The result of the statistical test is often classified as ”statistical significant” or ”non-significant” depending on whether ornot the p-value is smaller than a level of significance, often called ,and usually equal to 0.05.

Hypothesis testing is sometimes given a decision theoretic formulation:The null hypothesis is either true or false and a decision is made basedon the data.

The hypothesis being tested is often called the null hypothesis. A null hypothesis always represents a simplication of the statistical model.

51

When hypothesis testing is viewed as decisions, two types of error arepossible

• Type 1 error: Rejecting a true null hypothesis• Type 2 error: Accepting (i.e. not rejecting) a false null hypothesis.

The level of significance specifies the risk of a type 1 error. In theusual setting the null hypothesis is tested against an alternative hypothesis which includes different values of the parameter, e.g.

0 : 0 : 0 AH H against

The risk of a type 2 error depends on which of the alternative valuesare the true value.

The power of a statistical test is 1 minus the risk of type 2 error. Whenplanning an experiment power considerations are sometimes usedto determine the sample size. We return to this in the last lecture.

Once the data are collected confidence intervals are the appropriate way to summarize the uncertainty in the conclusions.

52

Relation between p-values and confidence intervals

In a two sample problem it is tempting to compare the 95% confidence intervals of the two means and conclude that the hypothesis is non-significant if the 95% confidence intervals overlap.

This is not correct.

Overlapping 95% confidence intervals does not imply that the difference is not significant on a 5% level. On the other hand, if the 95% confidence intervals do not overlap, the difference is statistical significant on a 5% level (actually, the p-value is 1% or smaller).

1 2

This may at first seem surprising, but it is a simple consequence of thefact that for independent samples the result

implies that( ) ( ) ( )Var x y Var x Var y

( ) ( ) ( )se x y se x se y