how can we mitigate against non- causal associations in design and analysis? epidemiology matters: a...

How can we mitigate against non-causal associations in design and

analysis?

Epidemiology matters: a new introduction to methodological foundations

Chapter 10

2Epidemiology Matters – Chapter 1

Seven steps

1. Define the population of interest2. Conceptualize and create measures of exposures and health

indicators3. Take a sample of the population4. Estimate measures of association between exposures and health

indicators of interest

5. Rigorously evaluate whether the association observed suggests a causal association

6. Assess the evidence for causes working together7. Assess the extent to which the result matters, is externally valid, to

other populations


1. Randomization

2. Matching

3. Stratification

4. Sources of non-comparability

5. Summary


1. Randomization

2. Matching

3. Stratification


5. Summary


Comparability

Exposed and unexposed should be comparable on all factors associated with the disease other than the exposure

One way to ensure this comparability is to randomize the exposure


ComparabilityWhat is wrong with non-comparability? Consider an example: Study: 5,000 smokers and 5,000 non-smokers are followed for

10 years After 10 years, the smokers have 3.0 times the risk of motor

vehicle crash fatality compared with non-smokers Are you comfortable reporting that smoking causes motor

vehicle crash fatality?


Comparability, an example Study: 5,000 smokers and 5,000 non-smokers are followed for

10 years After 10 years, the smokers have 3.0 times the risk of motor

vehicle crash fatality compared with non-smokers Are you comfortable reporting that smoking causes motor

vehicle crash fatality? Individuals who choose to smoke are more likely to engage in

other behaviors with adverse consequences for health


Randomization

Creates comparability between groups Removes individual’s ability to choose exposure

status


Randomized Control Trial, RCT

Sample from population (purposive) Assign individuals to be exposed or unexposed Follow population forward to determine who

develops outcome


The goal of RCT

We want our comparison groups to be “different” on just our main exposure that

we are studying in relation to some outcome

AND the “same” on all the other important

covariates


Why does randomization control for non-comparability? Example

Two investigators conduct two separate studies Exploring effects of regular cardiovascular exercise on

incidence of cardiovascular disease Population is post-menopausal women Hypothesis: exercise is protective against

cardiovascular disease


Example, study 1

Purposive sample of 80 post-menopausal women with no

history of cardiovascular disease

Asks women if they engage in ≥ 30 minutes of regular

cardiovascular exercise ≥ 3 times/week (regular exercise

compared to non-regular exercise)

Follows groups for five years

Count women in each group who have a cardiovascular event

Assume no losses to follow-up


Non-diseased Diseased

Non-exposed Exposed


Study 1


Study 1, interpretation

Those who exercise have approximately 0.5 times the risk of cardiovascular disease compared with those who do not exercise.

There are approximately 20 fewer cases of cardiovascular disease per every 100 people who exercise compared with those

who do not exercise.


Study 1,validity

Women who choose to exercise regularly may be more likely to be non-smokers, eat a more healthy diet, take multivitamins, etc.

We do not know whether the exercise had any causal effect on their cardiovascular health

In fact, the women who exercise had much lower average daily saturated fat intake than the non-exercisers


Impact of saturated fat intake

Exerciser with high saturated fat intake

Non-exerciser with high saturated fat intake

Exerciser without high saturated fat intake

Non-exerciser without high saturated fat intake


Impact of saturated fat intake

9 dotted people (high fat consumers) among 40 exercisers

Total prevalence = 22.5% of high fat consumption among the exercisers

18 dotted people (high fat consumers) among the 40 non-exercisers

Total prevalence = 45% of high fat consumption among the non-

exercisers

There is a greater proportion of high fat consumers among the non-exercisers


Example, study 2 Purposive sample of 80 post-menopausal women with no

history of cardiovascular disease Randomly assigns women to engage in ≥ 30 minutes of

regular cardiovascular exercise ≥ 3 times/week (regular exercise compared to non-regular exercise)

Follows groups for five years Counts women in each group who have a cardiovascular

event Assume no losses to follow-up or noncompliance


Study 2


Study 2 - interpretation

Risk of cardiovascular disease among those randomized to exercise is 14.3% less than the risk among those randomized to not exercise.

We expect 10 fewer cases per 100 individuals exposed compared with the unexposed.


Study 1 vs Study 2

Study 1 risk ratio = 0.5 and risk difference = -0.2 Study 2 risk ratio = 0.86 and risk difference = -0.1

Therefore, the effect is weaker in Study 2 than the effect in Study 1.

Why?


Study 2, impact of saturated fat intake

Exerciser with high saturated fat intake

Non-exerciser with high saturated fat intake

Exerciser without high saturated fat intake

Non-exerciser without high saturated fat intake


Study 2, impact of saturated fat intake

12 dotted people (high fat consumers) among 40 exercisers

Total prevalence = 30% of high fat consumption among the exercisers

12 dotted people (high fat consumers) among the 40 non-exercisers

Total prevalence = 30% of high fat consumption among the non-exercisers

There is the same proportion of excess high fat consumers among both groups


Limitations to randomization

1. Equipoise and ethics2. Complication and intention to treat analysis, 3. Placebos and placebo effects, and the 4. Importance of blinding


Randomization, summary

When randomization works, all factors that would differ between two

groups who got to choose their exposure status are, on average, evenly

distributed between the groups

This includes all known risk factors for the outcome and a myriad unknown

or difficult to measure

Because they are evenly distributed across the groups, factors cannot

affect the study estimates

Randomized trials are a powerful way to achieve comparability between

exposed and unexposed groups on both known and unknown factors that

cause the outcome


1. Randomization

2. Matching

3. Stratification


5. Summary


Matching

1. Why and how to match2. Analyzing matched pair data


Matching

Randomization often unethical and infeasible Matching controls non-comparability where

randomization is impossible


Matching

Participants matched on potential sources of non-comparability

Matching is a common way to control for non-comparability in design stage

In a cohort study, exposed individuals are matched to ≥ 1 unexposed individuals on ≥ 1 factor(s) of interest

In a case control study, diseased individuals are matched to a sample of disease free individuals


Matching, example

Research question: Is low regular consumption of fish oil associated with development of depression?

Sample 25 individuals with a first diagnosis of depression recruited

from local mental health treatment center 25 individuals with no history of depression from

community surrounding mental health treatment center


Matching, example

Concerned about sex as a potential source of non-comparability Women more likely to develop depression compared with

men Women on average have more nutritious diets and more

likely to supplement diets with fish oil Other potential sources of non-comparability to worry about

(though we are not necessarily matching on) are age, alcohol and cigarette use, socio-economic factors


Matching, example

Each time we select a case from the treatment center, we select one or more controls of the same sex


Matching to control non-comparability

Male low fish oil

Male high fish oil

Female low fish oil

Female high fish oil


Male low fish oil

Male high fish oil

Female low fish oil


Male Female Total

Low fish oil 9 18 27

High fish oil 7 16 23

Total 16 34 50

Matching to control non-comparability


Matching pairs, sex

Male low fish oil

Male high fish oil

Female low fish oil


Each pair is identical with respect to the matched factors

Sample had 50 individuals

Sample now has 25 matched pairs


Matching pairs, sex


Analyzing matched pair data


Analyzing matched pair, example

Interpretation: Individuals who do not consume fish oil are 2.0 times as likely

to develop depression as individuals who consume fish oil, controlling for sex.


1. Randomization

2. Matching

3. Stratification


5. Summary


Control of non-comparability

Design stage Randomization Matching

Analysis stage Stratification


Stratification

1. Why and how to stratify2. Interpreting stratified analyses


Control of non-comparability in theanalysis stage

Collect data on variables that might contribute to non-comparability

Our ability to control for non-comparability in analysis stage is only as good as the quality of measures of variables contributing to non-comparability


Is a potential factor related to non-comparability associated with the exposure and the outcome?

Control of non-comparability in theanalysis stage


Stratification

Stratification removes effects of non-comparable variable on an exposure-outcome relation by limiting the variance on that outcome


Stratification, exampleExamine relation between alcohol consumption and esophageal cancer among two groups

Non-smokers Among individuals who have never smoked a cigarette in their lives,

what is the relation between heavy alcohol consumption and esophageal cancer?

Smoking cannot confound the effect estimate because no individual in this subgroup has engaged in any smoking

Smokers Among smokers (presumably around the same duration and average

amount), were those who are heavy alcohol consumers more likely to develop esophageal cancer?

Smoking cannot confound the estimate because everyone is a smoker


Stratification examplenon-smokers

Conditional probability of esophageal cancer among heavy alcohol consumers = 1/6 or 16.7%

Conditional probability of esophageal cancer among not heavy alcohol consumers = 1/16 or 6.3%

Risk ratio = 16.7/ 6.3 = 2.65

Risk difference = 16.7– 6.3 = 10.4

Interpretation: There is an increased risk of esophageal cancer among heavy alcohol consumers, even in the subpopulation of individuals who do not smoke.


Stratification examplesmokers

Conditional probability of esophageal cancer among heavy alcohol consumers = 21/31, or 67.7%.

Conditional probability of esophageal cancer among not heavy alcohol consumers = 7/27 or 25.9%

Risk ratio = 67.7 / 25.9 = 2.61

Risk difference = 67.7 – 25.9 = 41.8

31

There is an increased risk of esophageal cancer among heavy alcohol consumers, even in the subpopulation of individuals who all smoke.


Stratification, example There is an increased risk of esophageal cancer among heavy

alcohol consumers, even in the subpopulation of individuals who do not smoke

There is an increased risk of esophageal cancer among heavy alcohol consumers, even in the subpopulation of individuals who all smoke

Therefore, even when we limit variance on the possible source of non-comparability (i.e., smoking) there still remains an increased risk of esophageal cancer among heavy alcohol drinkers


Non-comparability throughstratification

1. Careful and rigorous measurement of potential non-comparable

variables is key to control for non-comparability in data analysis

2. Before stratification, always check that potential non-comparable

variables are associated with exposure and outcome under study

3. If a variable is not associated with both exposure and outcome,

then stratifying or otherwise controlling for that variable will not

change the estimate of the effect of exposure on outcome


Non-comparability, another example

Example: cigarette smoking and depression

Rate of depression higher among cigarette smokers than among

non-smokers

Hypothesized that smoking can impact neurotransmitters in the

brain that impact negative mood and emotion

How could sex be a potential source of non-comparability in this

association?

Men are more likely than women to be smokers

Men are less likely to experience depression compared with women


Smoking and depressionexample

Population of interest is adults in general population Purposive sample of 80 individuals with no history of

depression Assess smoking status at baseline Follow over 5 years to see how many develop depression Assume no individuals were lost to follow-up



Female smoker

Female non-smoker

Male smoker

Male non-smoker



Male smoker

Male non-smoker

Female smoker

Female non-smoker


Smoking and depressionexample interpretation

Over five years, smokers had 1.04 times the risk of developing depression

compared with nonsmokers, and 1.05 times the odds. There are 10 excess cases

of depression among the smoking group per 100 persons over the course of 5

years (risk difference).

But what about sex?


Smoking and depressionsex association

Smoking and sex

73% of men are smokers

38.3% of women are smokers



Smoking and depressionsex association

Smoking and sex

73% of men are smokers

38.3% of women are smokers


Depression and sex

15% of men are depressed

53.2% of women are depressed

Men are less likely to have depression than women


Smoking and depressionstratified analysis, men

Among men, those who smoke have 1.5 times the risk of depression compared to those

who do not smoke, over 5 years.


Smoking and depressionstratified analysis, women

Among women, those who smoke have 1.49 times the risk of depression compared to those

who do not smoke, over 5 years.


Smoking and depressionstratified analysis, interpretation

Smoking was not associated with depression in original, crude analysis

Stratifying by sex, smoking is associated with the development of depression

Sex obscured the association between smoking and depression


1. Randomization

2. Matching

3. Stratification


5. Summary


Is every variable that is associated with exposure and

outcome a potential source of non-comparability?

No


Sources of non-comparability

1. Factors in the causal pathway are not non-comparable variables

2. Factors that are consequences of exposure and outcome


Factors in causal pathway

Factors that are on the causal pathway of interest between the exposure and outcome do not contribute to non-comparability

If we control for them, we will obstruct the ability to observe the true effects of the exposure on the outcome

Factors on the causal pathway of interest should not be controlled


Factors in causal pathway, example

Interested in prenatal exposure to tobacco smoke and offspring growth restriction during puberty

Hypothesize that prenatal exposure to tobacco causes low birth weight, and then this low birth weight causes growth restriction in puberty

Should not control for birth weight


What if we do control for birth weight through stratification?

Among offspring with low birth weight, we would find that exposure to tobacco smoke is unrelated to offspring growth restriction We restricted analysis to only those with the intermediary

outcome of interest - low birth weight Among offspring with normal birth weight, we would not find

an association between the exposure and outcome We restricted analysis to only those without the

intermediary outcome – low birth weight


1. Randomization

2. Matching

3. Stratification


5. Summary


Seven steps

1. Define the population of interest2. Conceptualize and create measures of exposures and health

indicators3. Take a sample of the population4. Estimate measures of association between exposures and health

indicators of interest

5. Rigorously evaluate whether the association observed suggests a causal association

6. Assess the evidence for causes working together7. Assess the extent to which the result matters, is externally valid, to

other populations


epidemiologymatters.org

how can we mitigate against non- causal associations in design and analysis? epidemiology matters: a...

Documents