biostatistics in practice

29
Biostatistics in Practice Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/ Biostat Session 5: Associations and Confounding

Upload: taite

Post on 12-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Biostatistics in Practice. Session 5: Associations and Confounding. Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/Biostat. Session 5 Preparation #1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biostatistics in Practice

Biostatistics in Practice

Peter D. ChristensonBiostatistician

http://gcrc.LABioMed.org/Biostat

Session 5: Associations and Confounding

Page 2: Biostatistics in Practice

Session 5 Preparation #1

1. We often hear news reports of "seasonally adjusted unemployment rates". Can you think of a logical way that this adjustment could be made?

Page 3: Biostatistics in Practice

Session 5 Preparation #2

Unadjusted

Adjusted

What does “adjusted” mean?

How is it done?

From Table 3

Page 4: Biostatistics in Practice

Goal One of Session 5Earlier: Compare means for a single measure among groups.

Use t-test, ANOVA.

Session 5: Relate two or more measures.

Use correlation or regression.

Qu et al(2005), JCEM 90:1563-1569.

ΔΔY/ΔX

Page 5: Biostatistics in Practice

Goal Two of Session 5

Try to isolate the effects of different characteristics on an outcome.

Previous slide:

Gender

BMI

GH Peak

Page 6: Biostatistics in Practice

CorrelationVisualize Y (vertical) by X (horizontal) scatter plot.

Pearson correlation, r, is used to measure association between two measures X and Y

Ranges from -1 (perfect inverse association) to 1 (perfect direct association)

Value of r does not depend on:

scales (units) of X and Ywhich role X and Y assume, as in a X-Y plot

Value of r does depend on: the ranges of X and Yvalues chosen for X, if X is fixed & Y is measured

Page 7: Biostatistics in Practice

Graphs and Values of Correlations

Page 8: Biostatistics in Practice

Logic for Value of Correlation

Σ (X-Xmean) (Y-Ymean)

√Σ(X-Xmean)2 Σ(Y-Ymean)2

r =

+

+-

-

Statistical software gives r.

Page 9: Biostatistics in Practice

Correlation Depends on Ranges of X & Y

Graph B contains only the graph A points in the ellipse.

Correlation is reduced in graph B.

Thus: correlations for the same quantities X and Y may be quite different in different study populations.

BA

Page 10: Biostatistics in Practice

Regression

Again: Y (vertical) by X (horizontal) scatterplot, as with correlation. See next slide.

X and Y now assume unique roles: Y is an outcome, response, output, dependent

variable. X is an input, predictor, independent variable. Regression analysis is used to:

Measure X-Y association, as with correlation. Fit a straight line through the scatter plot, for:

Prediction of Y from X. Estimation of Δ in Y for a unit change in X (slope = “effect” of X on Y).

Page 11: Biostatistics in Practice

Regression Example

ei

MinimizesΣei

2

Range for Individuals

Range for mean

Statistical software gives all this info.

Page 12: Biostatistics in Practice

X-Y Association

If slope=0 then X and Y are not associated.

But the slope measured from a sample will never be 0. How different from 0 does a measured slope need to be in order to claim X and Y are associated?

[ Side note: It turns out that slope=0 is equivalent to correlation r = 0. ]

Page 13: Biostatistics in Practice

X-Y AssociationTest slope=0 vs. slope≠0, with the rule:

Claim association (slope≠0) if

tc=|slope/SE(slope)| > t ≈ 2.

There is a 5% chance of claiming an X-Y association that really does not exist.

Note similarity to t-test for means:

tc=|mean/ SE(mean)|

Formula for SE(slope) is in statistics books.

Page 14: Biostatistics in Practice

Example Software OutputThe regression equation is: Y = 81.6 + 2.16 X

Predictor Coeff StdErr T PConstant 81.64 11.47 7.12 <0.0001X 2.1557 0.1122 19.21 <0.0001

S = 21.72 R-Sq = 79.0%

Predicted Values:

X: 100Fit: 297.21SE(Fit): 2.1795% CI: 292.89 - 301.5295% PI: 253.89 - 340.52

Predicted y = 81.6 + 2.16(100)

Range of Ys with 95% assurance for:

Mean of all subjects with x=100.

Individual with x=100.

19.21=2.16/0.112 should be between ~ -2 and 2 if “true” slope=0.

Refers to Intercept

Page 15: Biostatistics in Practice

Goal Two of Session 5

Try to isolate the effects of different characteristics on an outcome.

Ethnicity

Age

Outcome

Page 16: Biostatistics in Practice

Another Study

J Clin Endocrin Metab 2006 Nov; 91(11):4424-32.

Potential doping test for athletes.

Page 17: Biostatistics in Practice

Study Goals: Outcomes are IGF-1 and Collagen Markers

Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.

*

* for age, gender, and BMI.

Figure 2.One conclusion is lack of differences between ethnic IGF-1 means, after adjustment for age, gender, and BMI (Fig 2).

How are these adjustments made?

Page 18: Biostatistics in Practice

Adjustment: For a Single Continuous Characteristic

We simulate data for Caucasians and Africans only for simplicity, to demonstrate attenuation of a 155-140=15 μg/L ethnic difference to a 160-157=3 μg/L ethnic difference.

158155

160140

Page 19: Biostatistics in Practice

Adjustment: For a Single Continuous Characteristic

Problem:

Want to compare groups on IGF-1.

Groups to be compared (ethnicities) have different mean ages, and IGF-1 tends to decrease with age.

Solution:

Make groups appear to have the same mean age.

Page 20: Biostatistics in Practice

Adjustment: For a Single Continuous Characteristic

Solution: Make groups appear to have the same mean age.

To do this,

Find regression line predicting IGF-1 from age.

Move each subject parallel to the regression line to the mean age. This is the expected IGF-1 if this subject had been at the mean age.

Adjusted means are means of these adjusted individual values.

Page 21: Biostatistics in Practice

250

200

150

100

Age (Years)

IGF

1 (

ug

/L)

IGF1 Adjustment for Age - Simulated Data

(Mean)

140

155

15 = Diff

160157

Diff = 3

Unadjusted 22.2 Adjusted

CaucasianAfrican

15 30

Page 22: Biostatistics in Practice

Adjustment: For a Single Continuous Characteristic

We have just described a special case of multiple regression, in which an outcome is estimated by multiple predictors.

Simple Regression:

Estimated IGF-1 = intercept +slope(age)

Multiple Regression:

Estimated IGF-1 = intercept +slope(age) + diff(indicator)

Indicator = 0 if African, 1 if Caucasian.

Page 23: Biostatistics in Practice

Adjustment: For a Single Continuous Characteristic

Software:

Select Regression or Analysis of Covariance.

Usually menu such as

Output: Values of b0,b1,b2 for

IGF1=b0+b1(age)+b2(indicator)

Page 24: Biostatistics in Practice

Multiple Regression: Geometric View

LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B12).

LHCY = b0 + b1LCLC + b2LB12 is the equation of the plane

Multiple predictors may be continuous.

Geometrically, this is fitting a slanted plane to a cloud of points:

www.StatisticalPractice.com

Page 25: Biostatistics in Practice

How Are Coefficients Interpreted?

LHCY = b0 + b1LCLC + b2LB12

OutcomePredictors

LHCY

LCLC

LB12

LB12 may have both an independent and an indirect (via LCLC) association with LHCY

Correlation

b1 ?

b2 ?

Page 26: Biostatistics in Practice

Coefficients: Meaning of their Values

LHCY = b0 + b1LCLC + b2LB12

OutcomePredictors

LHCY increases by b2 for a 1-unit increase in LB12

… if other factors (LCLC) remain constant, or

… adjusting for other factors in the model (LCLC)

May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.

Page 27: Biostatistics in Practice

Another Example: HDL Cholesterol Std Coefficient Error t Pr > |t|

Intercept 1.16448 0.28804 4.04 <.0001 AGE -0.00092 0.00125 -0.74 0.4602 BMI -0.01205 0.00295 -4.08 <.0001BLC 0.05055 0.02215 2.28 0.0239PRSSY -0.00041 0.00044 -0.95 0.3436DIAST 0.00255 0.00103 2.47 0.0147GLUM -0.00046 0.00018 -2.50 0.0135SKINF 0.00147 0.00183 0.81 0.4221LCHOL 0.31109 0.10936 2.84 0.0051

The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is:

Log(HDL) = 1.16 - 0.00092(Age) +…+ 0.311(LCHOL)

www.

Statistical

Practice

.com

Output:

Page 28: Biostatistics in Practice

HDL Example: Coefficients

Interpretation of coefficients on previous slide:

1. Need to use entire equation for making predictions.

2. Each coefficient measures the difference in expected LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is 0.012 lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors.

Continued …

Page 29: Biostatistics in Practice

HDL Example: Coefficients

Interpretation of coefficients two slides back:

3. P-values measure the association of a factor with Log(HDL) , if other factors do not change.

This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and called its independent association.

SKINF is probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.