case-control studies i phd course spring 2009 university...

02-03-2011

1

Case-control studies

Rothman (2002) chapter 4Ph.d. course in "Epidemiology" spring 2011

Lau Caspar Thygesen, ph.d.

Aims

• Know difference between cohort and case-control studies

• Know different case-control study designs

• Describe odds ratio

• Know principles for case and control selection

• Describe advantages/disadvantages of CC-studies

• Know important biases and confounding


• Starting point: Subjects with the disease under study (cases)

• Record cases’ history of exposure

• Comparison group of individuals without the disease under study (controls) are assembled

• Their history of exposure is recorded the same way

• Cohort study: Concerned with frequency of disease among exposed and non-exposed

• C-c study: Concerned with the exposure in subjects with a specific disease and people without the disease

But why is that necessary?

• On statistical power…

Cohort studies – underpowered?

E.g.

• Occurrence of cervical cancer in 4 440 women

• Hospitalised with gonorrhoea and followed for 54,576 person-years at risk (12.2 years/women)

– Cervical cancer

– 11 cases observed

– 8.9 cases expected

– Relative risk 1.2

– 95% CI 0.6 to 2.2

02-03-2011

2

Cohort studies

Can examine

multiple effects of a single exposure

rare exposures

Exposure precedes outcomes

Allows direct measurement of incidence (rate, risk) of outcomes

Can elucidate temporal relationship

Allow study subjects to contribute person-time to multiple exposure categories

Biological material can be collected prior to outcome

Minimizes bias in the ascertainment of exposure

Advantages

Cohort studies

Inefficient for evaluation of rare diseases

If prospective, can be very expensive and time consuming

If retrospective, requires the availability of adequate records for both exposure and outcome

If prospective, cannot provide quick answers

If retrospective, precise classification of exposure and outcome may be difficult

Validity of the results can be seriously affected by losses to follow-up

Disadvantages

Cohort studies• Prospective (!)

• Starting point a population of healthy

Outcome in cohort study:Relative risk

Cohort studies measure

• Risk of disease among the exposed compared with the risk of disease among the non-exposed

• The absolute risk may be calculated for both groups!


• Became popular with the change from infectious disease epidemiology to chronic diseases

• Why?

– Western life-style diseases (cancer, heart diseases)

– Diseases with long latent period

– Most applicable when disease is rare

– Study many possible risk factors / causes

02-03-2011

3

Case-control studies (2)

• The case-control study aims at achieving the same goals as a cohort study but more efficiently using sampling

• Best understood by considering a sourcepopulation – the population that gives rise to the cases included in the study


• The same cases are identified as in a cohort studyand then classified in respect to exposure

• Instead of obtaining the denominators for the rates/risks a control group is sampled from the entire source population that gives rise the cases

• Control group used to determine the relative sizeof the exposed and unexposed components of the source population


• Therefore the cardinal requirement of controlselection is that the controls be sampledindependently of exposure status


Principle

288

persons

16 cases

Analyzed as a cohort study

• 25% of source population exposed

• Followed for one year

• Among exposed 8 cases during 72 person-years incidence rate = 0.111 cases/p-yr

• Among unexposed 8 cases during 216 person-years IR = 0.037 cases/p-yr

• IRR = 0.111 / 0.037 = 3

02-03-2011

4

Principle

288

persons

16 cases

Principle

288

persons

16 cases

Analyzed as a case-control study

• Sample a control group independently of exposure:– Among 48 in control group 12 exposed

– If sampled independently the same proportion of controls will be exposed as the people/person-time exposed in the source population

• The same cases are included as in the cohort

• These data can be used to estimate the same result as for the cohort design

Another representation of a cohort study

N+ exposed

N- uexposed

Time

Healthy (at risk)

Sick Cases

(a exp.

c unexp.)

’Classic’ case-control study

(Also termed a retrospective case-control study or a case-noncase study)

-Sick are identified and information on exposure are obtained(by interview)

-Controls are identified from the assumed source population and information on exposure are obtained

(Retrospective means that information on exposure areobtained after debut of disease)

’Cohort presentation’ of a case-noncase study

’Cohort’-start

N+ exposed

N- unexposed

Time

Healthy (at risk)

Sick Cases (a and c)

Controls

sampled

(b and d)

02-03-2011

5

Exercise 1

• Discuss advantages and disadvantages of the case-control study design

• Give three examples of associations whichwould be well-suited for a case-control study

Types of case-control studies

• Classic case-control study / retrospective case-control study / case-noncase / (cumulative-incidence) case-control studies

• Density sampling case-control study

• Case-cohort study

Density case-control study design

• Density-based sampling– The phrase comes from the term incidence

density, which is sometimes used as a synonym for incidence rate

• In a cohort study:– Incidence rate in the exposed/unexposed

populations:

I1 = a / person-time1

I0 = c / person-time0

Density-based sampling

• In a case-control study with density-basedsampling, the control series provides an estimateof the proportion of the total person-time for exposed and unexposed cohorts in the sourcepopulation:

• These ratios are called control sampling rates for the exposed and unexposed components of the source population

Density sampling (3)

• These sampling rates will be equal if the controlsampling is conducted independently of exposure

• If this is acchieved then the incidence rate ratios can be estimated from the c-c data:

• Because

• ad / bc is called the odds ratio

Density sampling

Time

Healthy (at risk)

Sick

’Cohort’-start

Cases

Controls sampled with probabilityequalling risk time

02-03-2011

6


• Using the OR in a case-control study usingdensity sampling

• Can obtain a valid estimate of the incidencerate ratio in a population

• Without having to obtain individualinformation on every person in the population!


• Density / risk set sampling:

• Choose controls from the unique set of peoplein the source population who are at risk of becoming a case at the precise time that eachcase is diagnosed

• The same person could therefore both be a control and a case in the same study!

Control sampling

• Dataset of cases of bleeding (n = 3652) • Age- and sex-matched control group of 10 subjects per case (n = 36 502) • Cases fulfilled the following criteria

– admission with peptic ulcer or gastritis as main diagnosis within one of the County’s hospitals from 1995 to 2006

– significant bleeding defined either by melena, subnormal haemoglobin or the need for transfusions

– potential bleeding source in the stomach or duodenum identified by endoscopy or surgery

• Cases were assigned an index date as their first registered date of a UGB diagnosis

• 10 controls for each case sampled by risk set sampling technique• Controls were randomly selected among those within the county who

matched the case with respect to gender and exact birth year• Cases were eligible as control subjects until their first admission with UGB

Sampling from a cohort

• (Matched) incidence density sampled case-control studies are money-saving sampling plans

• The design allows estimation of the same parameters as do a total follow up of the entire cohort (with less precision)

• Another sampling scheme allowing this is the case-cohort design

• This design allows analysis of several types of event using the same “contols”

02-03-2011

7

Case-cohort design

• When obtaining information on covariates from a cohort is expensive or difficult

• A sub-sample from the cohort is selected at study entrance and information of covariates obtained

• Fraction of total number of people in study population rather than person-time

• Controls have the same chance of being selected irrespectively of person-time spent

• Control may also become a case!• Makes studies of multiple outcomes in cohort cost

efficient

Cohort presentation of case-cohort study

Time

Healthy (at risk)

Sick

’Cohort’-start

N+ exposed

N- unexposed

Controls sampled

Cases

Example

Paediatric and Perinatal Epidemiology 2007;21:507–517.

Example

• In Denmark, routinely collected neonatal DBS samples have been stored at -25°C since 1981 in a central registry (Biological Specimen Bank for Neonatal Screening).

• Information from Danish registries makes it possible to identify DBS samples from individuals who later developed T1D

• DBS samples from 2086 validated Danish T1D patients from the birth cohorts 1981–2002, and two matching controls per patient

• Case and control samples were matched by place and date of birth

• Results interpretable as ratios of T1D risks

Case-cohort design

• Select a sub-cohort from the original cohort

• Collect covariate information on these persons

• Take all persons with events of the type(s) of interest, and collect covariate information on these

• The analysis of these data allows estimation of risk-ratios

Types of case-control studies

• Classic case-control study / retrospective case-control study / case-noncase / (cumulative-incidence) case-control studies

• Density sampling case-control study

• Case-cohort study

02-03-2011

8

Nested case-control studies

• A case-control study ’nested’ into a cohort

• Rothman states that all case-control studies are nested within a cohort – hypothetical or well-defined

Control and case at the same time?

• Recall in a cohort study, each person who develops the disease would contribute not only to the numerator of the disease rate but also to the person-time experienceuntil the time of disease onset

• The control group in a c-c study is intended to provideestimates of the relative size of the denominotrs of the incidence rates for the compared groups

• Therefore, each case in a case-control study shouldhave been eligible to be a control before the time of disease onset

Example

Radiation

Yes No Total

Breast cancer cases 41 15 56

Person-years 28,010 (59.6%) 19,017 47,027

Rate/10,000 p-yrs 14.6 7.9 11.9

IRR = (41/28010) / (15/19017) = 14.6 / 7.9 = 1.86

Example

Radiation

Yes No Total

Breast cancer cases 41 15 56

Person-years 28,010 (59.6%) 19,017 47,027

Controls 298 (59.6%) 202 500

Rate/10,000 p-yrs 14.6 7.9 11.9

IRR = (41/28010) / (15/19017) = 14.6 / 7.9 = 1.86

OR = (41 / 298) / (15 / 202) = 1.85

Aims

• Know difference between cohort and case-control studies

• Know different case-control study designs

• Describe odds ratio

• Know principles for case and control selection

• Describe advantages/disadvantages of CC-studies

• Know important biases and confounding

The 2x2 table

02-03-2011

9

Outcome measure: Odds ratio

• Odds: measure of frequency of exposure in group• Odds have no unit• Odds among cases = Number of cases exposed to risk faktor

Number of cases not exposed

• Odds among controls = Number of controls exposed to risk faktorNumber of controls not exsposed

• Odds ratio: Odds for cases/odds for controls• Measure of association: if the exposure is a cause of disease, then

sick persons (cases) should be exposed more often than controls!

Calculation of odds ratio

Principle

288

persons

16 cases

Cases Controls

Exposed 8 12

Non-exposed 8 36

16 48

OR = (8 / 8) / (12 / 36) = 3

Same result as the cohort study

Example: Heavy lifts and knee arthritis

• Swedish study

• Cases: All persons having a kneereplacement surgery 1991-93 aged 55-69 years with debut symptoms before 50 years

• Source population: All men and women born1921-38 living in same county

• Controls a random sample of 750 persons aged 55-69 years alive the same day as the case diagnostic

• Detailed information on work exposure beforethe age 50 years

Exercise 2

Exposure to heavy lifts

Cases (%) Controls (%)

Yes 209 (64%) 202 (35%)

No 116 (36%) 382 (65%)

Sum 325 (100%) 584 (100%)

•Calculate the odds ratio for the association betweenheavy lifts and knee arthritis

•Interpret the result

02-03-2011

10

Exercise 2

Exposure to heavy lifts

Cases (%) Controls (%)

Yes 209 (64%) 202 (35%)

No 116 (36%) 382 (65%)

Sum 325 (100%) 584 (100%)

•OR = (209 / 116) / (202 / 382) = 3.4

•Density sampling case-control study

Odds ratio vs relative risk

Gastroenteritis

Cases Controls

Lunch 23/1 Yes 18 14 32

No 19 43 62

37 57 94

•Why is relative risk not used in case-control studies?

•RR = (18/32) / (19/62) = 1.84

Odds ratio vs relative risk

Gastroenteritis

Cases Controls

Lunch 23/1 Yes 18 140 158

No 19 430 449

37 570 607

•Why is relative risk not used in case-control studies?

•RR = (18/158) / (19/449) = 2.69•Because the calculation is nonsense!

Rare disease assumption and c-c study designs

• It is often claimed that the OR only approximates the IRR or risk ratio when the disease is rare

• This is only true for the case-noncase design– Sampling of controls from those who, at the end of follow-

up, remained free of disease– Then OR will overestimate the risk ratio because the

exposure proportion among controls would be smallerthan all at start of follow-up

– If disease were rare the OR would be a reasonableestimate of the risk ratio

• In density case-control studies or case-cohort studies there is no need for the rare disease assumption for the OR to be a valid estimate of the IRR or the risk ratio

Issues in case-control studies

• Definition and selection of cases

• Selection of controls

• Ascertainment of disease and exposure status

Issues – case definition

• Demands precise definition

• Time, place, and person

• The definition of cases implicitly defines their source population, in which controls should be identified and selected– Colon cancer in DK 1973-88

– Myocardial infarction among 60-70 old males in DK 1973-88

• Working definition may be refined during study work up

02-03-2011

11

Prevalent or incident cases Prevalent or incident cases?

• Number of AIDS-cases in DK 1989– Prevalence, measure of disease burden

• Number of newly diagnosed AIDS-cases in DK 1989– Incidence, measure of risk

• In a CC study of risk of AIDS, what measure to use?– Incidence

• By including both incident and prevalent cases makes interpretation difficult– Coffee may be a risk factor for gastric ulcer, but if you have a

gastric ulcer, you drink less coffee because of stomach pains

• Important that exposure precedes outcome, therefore use incident cases

Finding cases, examples

• Hospital source

– Easy, but bias possible

• Certain localisation (restaurant outbreak)

• Population source (lung cancer, register)

– Often costly, but used in DK because of good registers

Generalisability

• Must cases reflect all persons with the disease?

• Myocardial infarction– All cases in Copenhagen County 1989, or

– Males 45-74 år hospitalised 1989 at Herlev Amtssygehus?

• Validity most important, not generalisability!

• Strive to obtain complete and accurate exposure information

Choosing controls

• Crucial point – challenging!

• Must reflect the question: whether the frequency of an exposure observed among cases is different than that among comparable individuals without the disease

• A representative sample of the population that the sick persons come from, must have the same ”risk” of exposure as cases

• Sampling must be unrelated to expsoure; i.e. controls must be selcted at random from the study population

• A control is a case without the outcome

Hospital controls

• Pro’s

– Easy to find

– Participation rate high

– May minimize recall bias?

– Subjected to the same specific and unspecific factors that made the cases attend this hospital

02-03-2011

12

Hospital controls

• Con’s

– Sick by definition, do not represent the distribution of exposures in the background population

– Yields a biased estimate (in what direction?)

What patients may be used as controls?

• Controls to lung cancer patients, patients with– Bronchitis?

– Heart diseases?

– Hip fractures?

– Stomach ulcers?

– Asthma?

• The diseases of the controls may be associated with the risk factors under study (positively/negatively), which is not desirable

Population controls

• Typically, when cases come from a precisely defined and identifed population

• Examples– Central population register– Households– Random digit dialing– Voters lists

• Challenges– Larger expenses– Hard to get hold of people (working, not at home)– Less motivation (low participation rates)– Recall bias– Problems with random digit dialing in the USA?

Special groups

• Neighbours, friends, family

• Advantages

– Cooperative

– Confounder control (how?)

• Disadvantages

– More alike cases (result?)

– Dilution of estimate

More control groups?

• Ideally one per case group, but sometimes desirable with more groups

• When no ideal control group can be selected (e.g. patient groups)

• Breast cancer patients (hospital-based):

– Gynaecological cancers, non-cancergynaecological patients, emergency operations

• NB: Cost benefit!

02-03-2011

13

Number of controls per case?

• What is gained by more controls per case?• If cases are hard to find, increased statistical strengthBut• 1:3 (=1 case + 3 controls) less statistical strength than 2

+ 2 (2:2) (m number of controls as cases):

• SD(ln(OR))=

• More than 4 or 5 controls waste of time and money (only relevant if controls are “cheap” compared to cases)

Information on exposure

• Numerous possibilities– Registers– Hospital files– (Telephone) interviews– Etc.

• CC-study of Hodgkin lymphoma and birth weight– Cases interviewed about birth weight in hospital– Controls information on birth weight from the Central Birth

Register– OK?

• No, information should ideally be obtained in the same way and from the same place/source from cases as well as controls, otherwise risk of bias

Data sources

Exposure

Existing data

registers

medical records

bio-banks

Questionnaires

interview

self-administered

Ad hoc measurements

clinical parametes

biological samples

Outcome

Registers

Clinical examination

Information from study subjects

interview

questionnaire

Information from next-of-kin

Mortality data

Register studies in DK

CPR Register

National Death Files

National Hospital Register

Birth Register

Prescription Databases

IDA Register(socioeconomic

variables)

Cancer Registry

Register studies

Registers are highly valuable data sources, BUT

Difficulties in interpretation due to incomplete data on competing risk factors

Life-style factors, socioeconomic factors, comorbidity, medical treatment

Other potential biases

Misclassification, non-compliance, etc.

02-03-2011

14

Issues in case-control studies

• Definition and selection of cases

• Selection of controls

• Ascertainment of disease and exposure status

• Boils down to chance, bias and confounding

Chance

• Random error will approach null when studysize increases

• Statistical issue

Bias – systematic errors

• Two main types (many subtypes defined…)

• Selection bias– Systematic differences between study participants and

non-participants

– Association between exposure and disease differs for those who participate and those who do not participate

• Information bias– Information collected about or from study subjects is

erroneous

Selection bias

• The common consequence of selection bias is that the association between exposure and outcome among those selected for analysis differs from the association among those eligible

Information bias

• Information collected in the study is erroneous

• Information is misclassified if the variable is measured on a categorical scale

• Both exposure and outcome (and confounders!) can be misclassified

• Two types– Non-differential

– Differential

Non-differential misclassification

• Exposure is non-differential misclassified if it is unrelated to the occurence of disease

• Disease is non-differential misclassified if it is unrelated to exposure

• Non-differential misclassification tends to dilute the effect

• This is always true for dichotomous variables

02-03-2011

15

Differential misclassification

• Exposure is misclassified depending onoutcome

• One very important type is recall bias in case-control studies

Recall bias

• Cases remember exposures better than controls

• Anders remember perfectly what he had for lunch at Hotel Hans Egede September 9 in Nuuk, Greenland. Why? He (and many others) got gastroenteritis

Recall bias 2

• Mothers who have given birth to a baby with a serious birth defect are thought to be able to recall accurately many exposures during earlypregnancy

• This will not be true for control mothers

Recall bias 3

• How can this be prevented in c-c studies?

– Frame the questions to aid accurate recall

– Take an entirely different control group that willnot be subject to incomplete recall (mothers to babies with other birth defects)

– Conduct a study that does not use interview information – e.g. use medical records beforebirth outcome was known

Differential misclassification

• May exagerate or underestimate an effect

• Therefore in general much more serious thannon-differential misclassification

Confounding

• Mixture of an effect of exposure on outcome with the effect of a third factor

• Presence of a factor which is predictor of outcome and associated with exposure

02-03-2011

16

Confounder control - principle

• Overall principle: only compare individuals with same level of confounding

or in other words

• Compare groups only different with respect to the variable in focus

• Control in design and analysis

Control in study design

• What are the possibilities?– Randomisation

• Example clinical trials (lottery)

• Adjusts for known and unknown confounding factors

– Restriction• Restriction of compared groups to certain levels of confounding

• Intuitive– compare sick children with healthy children, not with healthy adults

– Matching (individual)• Individul matching case/control on e.g. sex, age, neighbourhood

• Adjusts for unknown confounding factors

• Intuitively obvious and attractive design

Restriction

• Restriction of study groups to same categories or confounder categories

• Examples:

– Risk factors for fracture of femoral neck

• Age, sex

– Sun light and risk of malignant melanomas

• Age, sex, ethnicity

Restriction – pros and cons

• Advantages:– Simple and cheap

– Almost complete confounder control, if the range of confounders are limited

• Disadvantages:– May reduce the number of study persons – little

statistical strength

– Residual confounding (?), if restriction is not narrow enough

– Reduces generalisability

Matching (individual)

• Choice of identical (!) control for each case, match on confounders

• Must be considered both in design and analysis

• Attractive and often used, but some disadvantages

Matching – disadvantages

• Difficult, expensive and time consuming to find controls

• Example: By control for age (5 categories), sex and race (3 categories), how many combinations?– 30!– Even more difficult at 2:1, 3:1 and 4:1 matching

• No control of other confounders other than those matched for

• Less statistical strength if the matching variable is onlyweakly associated with exposusre or outcome

02-03-2011

17

Identifation of potential confounders –how to do it in practice?

• In design phase information on possible confounders

– All known risk factors in detail

– Sex and age as a minimum

• In analysis

– Stratified analysis

– Multivariate analysis

Conclusions – take home messages

• Well suited in case of rare diseases, long latency time and multiple risk factors

• Cheap and effective• Association measured in OR

– Interpretation depends on control selection

• Selection of control group challenging – controls should ideally be cases who just haven’t developed the disease (yet)

• Main error sources include chance, bias and confounding

case-control studies i phd course spring 2009 university...

Documents