case-control studies i phd course spring 2009 university...
TRANSCRIPT
02-03-2011
1
Case-control studies
Rothman (2002) chapter 4Ph.d. course in "Epidemiology" spring 2011
Lau Caspar Thygesen, ph.d.
Aims
• Know difference between cohort and case-control studies
• Know different case-control study designs
• Describe odds ratio
• Know principles for case and control selection
• Describe advantages/disadvantages of CC-studies
• Know important biases and confounding
Case-control studies
• Starting point: Subjects with the disease under study (cases)
• Record cases’ history of exposure
• Comparison group of individuals without the disease under study (controls) are assembled
• Their history of exposure is recorded the same way
• Cohort study: Concerned with frequency of disease among exposed and non-exposed
• C-c study: Concerned with the exposure in subjects with a specific disease and people without the disease
But why is that necessary?
• On statistical power…
Cohort studies – underpowered?
E.g.
• Occurrence of cervical cancer in 4 440 women
• Hospitalised with gonorrhoea and followed for 54,576 person-years at risk (12.2 years/women)
– Cervical cancer
– 11 cases observed
– 8.9 cases expected
– Relative risk 1.2
– 95% CI 0.6 to 2.2
02-03-2011
2
Cohort studies
Can examine
multiple effects of a single exposure
rare exposures
Exposure precedes outcomes
Allows direct measurement of incidence (rate, risk) of outcomes
Can elucidate temporal relationship
Allow study subjects to contribute person-time to multiple exposure categories
Biological material can be collected prior to outcome
Minimizes bias in the ascertainment of exposure
Advantages
Cohort studies
Inefficient for evaluation of rare diseases
If prospective, can be very expensive and time consuming
If retrospective, requires the availability of adequate records for both exposure and outcome
If prospective, cannot provide quick answers
If retrospective, precise classification of exposure and outcome may be difficult
Validity of the results can be seriously affected by losses to follow-up
Disadvantages
Cohort studies• Prospective (!)
• Starting point a population of healthy
Outcome in cohort study:Relative risk
Cohort studies measure
• Risk of disease among the exposed compared with the risk of disease among the non-exposed
• The absolute risk may be calculated for both groups!
Case-control studies
• Became popular with the change from infectious disease epidemiology to chronic diseases
• Why?
– Western life-style diseases (cancer, heart diseases)
– Diseases with long latent period
– Most applicable when disease is rare
– Study many possible risk factors / causes
02-03-2011
3
Case-control studies (2)
• The case-control study aims at achieving the same goals as a cohort study but more efficiently using sampling
• Best understood by considering a sourcepopulation – the population that gives rise to the cases included in the study
Case-control studies (3)
• The same cases are identified as in a cohort studyand then classified in respect to exposure
• Instead of obtaining the denominators for the rates/risks a control group is sampled from the entire source population that gives rise the cases
• Control group used to determine the relative sizeof the exposed and unexposed components of the source population
Case-control studies (4)
• Therefore the cardinal requirement of controlselection is that the controls be sampledindependently of exposure status
Case-control studies
Principle
288
persons
16 cases
Analyzed as a cohort study
• 25% of source population exposed
• Followed for one year
• Among exposed 8 cases during 72 person-years incidence rate = 0.111 cases/p-yr
• Among unexposed 8 cases during 216 person-years IR = 0.037 cases/p-yr
• IRR = 0.111 / 0.037 = 3
02-03-2011
4
Principle
288
persons
16 cases
Principle
288
persons
16 cases
Analyzed as a case-control study
• Sample a control group independently of exposure:– Among 48 in control group 12 exposed
– If sampled independently the same proportion of controls will be exposed as the people/person-time exposed in the source population
• The same cases are included as in the cohort
• These data can be used to estimate the same result as for the cohort design
Another representation of a cohort study
N+ exposed
N- uexposed
Time
Healthy (at risk)
Sick Cases
(a exp.
c unexp.)
’Classic’ case-control study
(Also termed a retrospective case-control study or a case-noncase study)
-Sick are identified and information on exposure are obtained(by interview)
-Controls are identified from the assumed source population and information on exposure are obtained
(Retrospective means that information on exposure areobtained after debut of disease)
’Cohort presentation’ of a case-noncase study
’Cohort’-start
N+ exposed
N- unexposed
Time
Healthy (at risk)
Sick Cases (a and c)
Controls
sampled
(b and d)
02-03-2011
5
Exercise 1
• Discuss advantages and disadvantages of the case-control study design
• Give three examples of associations whichwould be well-suited for a case-control study
Types of case-control studies
• Classic case-control study / retrospective case-control study / case-noncase / (cumulative-incidence) case-control studies
• Density sampling case-control study
• Case-cohort study
Density case-control study design
• Density-based sampling– The phrase comes from the term incidence
density, which is sometimes used as a synonym for incidence rate
• In a cohort study:– Incidence rate in the exposed/unexposed
populations:
I1 = a / person-time1
I0 = c / person-time0
Density-based sampling
• In a case-control study with density-basedsampling, the control series provides an estimateof the proportion of the total person-time for exposed and unexposed cohorts in the sourcepopulation:
• These ratios are called control sampling rates for the exposed and unexposed components of the source population
Density sampling (3)
• These sampling rates will be equal if the controlsampling is conducted independently of exposure
• If this is acchieved then the incidence rate ratios can be estimated from the c-c data:
• Because
• ad / bc is called the odds ratio
Density sampling
Time
Healthy (at risk)
Sick
’Cohort’-start
Cases
Controls sampled with probabilityequalling risk time
02-03-2011
6
Density sampling (4)
• Using the OR in a case-control study usingdensity sampling
• Can obtain a valid estimate of the incidencerate ratio in a population
• Without having to obtain individualinformation on every person in the population!
Density sampling (5)
• Density / risk set sampling:
• Choose controls from the unique set of peoplein the source population who are at risk of becoming a case at the precise time that eachcase is diagnosed
• The same person could therefore both be a control and a case in the same study!
Control sampling
• Dataset of cases of bleeding (n = 3652) • Age- and sex-matched control group of 10 subjects per case (n = 36 502) • Cases fulfilled the following criteria
– admission with peptic ulcer or gastritis as main diagnosis within one of the County’s hospitals from 1995 to 2006
– significant bleeding defined either by melena, subnormal haemoglobin or the need for transfusions
– potential bleeding source in the stomach or duodenum identified by endoscopy or surgery
• Cases were assigned an index date as their first registered date of a UGB diagnosis
• 10 controls for each case sampled by risk set sampling technique• Controls were randomly selected among those within the county who
matched the case with respect to gender and exact birth year• Cases were eligible as control subjects until their first admission with UGB
Sampling from a cohort
• (Matched) incidence density sampled case-control studies are money-saving sampling plans
• The design allows estimation of the same parameters as do a total follow up of the entire cohort (with less precision)
• Another sampling scheme allowing this is the case-cohort design
• This design allows analysis of several types of event using the same “contols”
02-03-2011
7
Case-cohort design
• When obtaining information on covariates from a cohort is expensive or difficult
• A sub-sample from the cohort is selected at study entrance and information of covariates obtained
• Fraction of total number of people in study population rather than person-time
• Controls have the same chance of being selected irrespectively of person-time spent
• Control may also become a case!• Makes studies of multiple outcomes in cohort cost
efficient
Cohort presentation of case-cohort study
Time
Healthy (at risk)
Sick
’Cohort’-start
N+ exposed
N- unexposed
Controls sampled
Cases
Example
Paediatric and Perinatal Epidemiology 2007;21:507–517.
Example
• In Denmark, routinely collected neonatal DBS samples have been stored at -25°C since 1981 in a central registry (Biological Specimen Bank for Neonatal Screening).
• Information from Danish registries makes it possible to identify DBS samples from individuals who later developed T1D
• DBS samples from 2086 validated Danish T1D patients from the birth cohorts 1981–2002, and two matching controls per patient
• Case and control samples were matched by place and date of birth
• Results interpretable as ratios of T1D risks
Case-cohort design
• Select a sub-cohort from the original cohort
• Collect covariate information on these persons
• Take all persons with events of the type(s) of interest, and collect covariate information on these
• The analysis of these data allows estimation of risk-ratios
Types of case-control studies
• Classic case-control study / retrospective case-control study / case-noncase / (cumulative-incidence) case-control studies
• Density sampling case-control study
• Case-cohort study
02-03-2011
8
Nested case-control studies
• A case-control study ’nested’ into a cohort
• Rothman states that all case-control studies are nested within a cohort – hypothetical or well-defined
Control and case at the same time?
• Recall in a cohort study, each person who develops the disease would contribute not only to the numerator of the disease rate but also to the person-time experienceuntil the time of disease onset
• The control group in a c-c study is intended to provideestimates of the relative size of the denominotrs of the incidence rates for the compared groups
• Therefore, each case in a case-control study shouldhave been eligible to be a control before the time of disease onset
Example
Radiation
Yes No Total
Breast cancer cases 41 15 56
Person-years 28,010 (59.6%) 19,017 47,027
Rate/10,000 p-yrs 14.6 7.9 11.9
IRR = (41/28010) / (15/19017) = 14.6 / 7.9 = 1.86
Example
Radiation
Yes No Total
Breast cancer cases 41 15 56
Person-years 28,010 (59.6%) 19,017 47,027
Controls 298 (59.6%) 202 500
Rate/10,000 p-yrs 14.6 7.9 11.9
IRR = (41/28010) / (15/19017) = 14.6 / 7.9 = 1.86
OR = (41 / 298) / (15 / 202) = 1.85
Aims
• Know difference between cohort and case-control studies
• Know different case-control study designs
• Describe odds ratio
• Know principles for case and control selection
• Describe advantages/disadvantages of CC-studies
• Know important biases and confounding
The 2x2 table
02-03-2011
9
Outcome measure: Odds ratio
• Odds: measure of frequency of exposure in group• Odds have no unit• Odds among cases = Number of cases exposed to risk faktor
Number of cases not exposed
• Odds among controls = Number of controls exposed to risk faktorNumber of controls not exsposed
• Odds ratio: Odds for cases/odds for controls• Measure of association: if the exposure is a cause of disease, then
sick persons (cases) should be exposed more often than controls!
Calculation of odds ratio
Principle
288
persons
16 cases
Cases Controls
Exposed 8 12
Non-exposed 8 36
16 48
OR = (8 / 8) / (12 / 36) = 3
Same result as the cohort study
Example: Heavy lifts and knee arthritis
• Swedish study
• Cases: All persons having a kneereplacement surgery 1991-93 aged 55-69 years with debut symptoms before 50 years
• Source population: All men and women born1921-38 living in same county
• Controls a random sample of 750 persons aged 55-69 years alive the same day as the case diagnostic
• Detailed information on work exposure beforethe age 50 years
Exercise 2
Exposure to heavy lifts
Cases (%) Controls (%)
Yes 209 (64%) 202 (35%)
No 116 (36%) 382 (65%)
Sum 325 (100%) 584 (100%)
•Calculate the odds ratio for the association betweenheavy lifts and knee arthritis
•Interpret the result
02-03-2011
10
Exercise 2
Exposure to heavy lifts
Cases (%) Controls (%)
Yes 209 (64%) 202 (35%)
No 116 (36%) 382 (65%)
Sum 325 (100%) 584 (100%)
•OR = (209 / 116) / (202 / 382) = 3.4
•Density sampling case-control study
Odds ratio vs relative risk
Gastroenteritis
Cases Controls
Lunch 23/1 Yes 18 14 32
No 19 43 62
37 57 94
•Why is relative risk not used in case-control studies?
•RR = (18/32) / (19/62) = 1.84
Odds ratio vs relative risk
Gastroenteritis
Cases Controls
Lunch 23/1 Yes 18 140 158
No 19 430 449
37 570 607
•Why is relative risk not used in case-control studies?
•RR = (18/158) / (19/449) = 2.69•Because the calculation is nonsense!
Rare disease assumption and c-c study designs
• It is often claimed that the OR only approximates the IRR or risk ratio when the disease is rare
• This is only true for the case-noncase design– Sampling of controls from those who, at the end of follow-
up, remained free of disease– Then OR will overestimate the risk ratio because the
exposure proportion among controls would be smallerthan all at start of follow-up
– If disease were rare the OR would be a reasonableestimate of the risk ratio
• In density case-control studies or case-cohort studies there is no need for the rare disease assumption for the OR to be a valid estimate of the IRR or the risk ratio
Issues in case-control studies
• Definition and selection of cases
• Selection of controls
• Ascertainment of disease and exposure status
Issues – case definition
• Demands precise definition
• Time, place, and person
• The definition of cases implicitly defines their source population, in which controls should be identified and selected– Colon cancer in DK 1973-88
– Myocardial infarction among 60-70 old males in DK 1973-88
• Working definition may be refined during study work up
02-03-2011
11
Prevalent or incident cases Prevalent or incident cases?
• Number of AIDS-cases in DK 1989– Prevalence, measure of disease burden
• Number of newly diagnosed AIDS-cases in DK 1989– Incidence, measure of risk
• In a CC study of risk of AIDS, what measure to use?– Incidence
• By including both incident and prevalent cases makes interpretation difficult– Coffee may be a risk factor for gastric ulcer, but if you have a
gastric ulcer, you drink less coffee because of stomach pains
• Important that exposure precedes outcome, therefore use incident cases
Finding cases, examples
• Hospital source
– Easy, but bias possible
• Certain localisation (restaurant outbreak)
• Population source (lung cancer, register)
– Often costly, but used in DK because of good registers
Generalisability
• Must cases reflect all persons with the disease?
• Myocardial infarction– All cases in Copenhagen County 1989, or
– Males 45-74 år hospitalised 1989 at Herlev Amtssygehus?
• Validity most important, not generalisability!
• Strive to obtain complete and accurate exposure information
Choosing controls
• Crucial point – challenging!
• Must reflect the question: whether the frequency of an exposure observed among cases is different than that among comparable individuals without the disease
• A representative sample of the population that the sick persons come from, must have the same ”risk” of exposure as cases
• Sampling must be unrelated to expsoure; i.e. controls must be selcted at random from the study population
• A control is a case without the outcome
Hospital controls
• Pro’s
– Easy to find
– Participation rate high
– May minimize recall bias?
– Subjected to the same specific and unspecific factors that made the cases attend this hospital
02-03-2011
12
Hospital controls
• Con’s
– Sick by definition, do not represent the distribution of exposures in the background population
– Yields a biased estimate (in what direction?)
What patients may be used as controls?
• Controls to lung cancer patients, patients with– Bronchitis?
– Heart diseases?
– Hip fractures?
– Stomach ulcers?
– Asthma?
• The diseases of the controls may be associated with the risk factors under study (positively/negatively), which is not desirable
Population controls
• Typically, when cases come from a precisely defined and identifed population
• Examples– Central population register– Households– Random digit dialing– Voters lists
• Challenges– Larger expenses– Hard to get hold of people (working, not at home)– Less motivation (low participation rates)– Recall bias– Problems with random digit dialing in the USA?
Special groups
• Neighbours, friends, family
• Advantages
– Cooperative
– Confounder control (how?)
• Disadvantages
– More alike cases (result?)
– Dilution of estimate
More control groups?
• Ideally one per case group, but sometimes desirable with more groups
• When no ideal control group can be selected (e.g. patient groups)
• Breast cancer patients (hospital-based):
– Gynaecological cancers, non-cancergynaecological patients, emergency operations
• NB: Cost benefit!
02-03-2011
13
Number of controls per case?
• What is gained by more controls per case?• If cases are hard to find, increased statistical strengthBut• 1:3 (=1 case + 3 controls) less statistical strength than 2
+ 2 (2:2) (m number of controls as cases):
• SD(ln(OR))=
• More than 4 or 5 controls waste of time and money (only relevant if controls are “cheap” compared to cases)
Information on exposure
• Numerous possibilities– Registers– Hospital files– (Telephone) interviews– Etc.
• CC-study of Hodgkin lymphoma and birth weight– Cases interviewed about birth weight in hospital– Controls information on birth weight from the Central Birth
Register– OK?
• No, information should ideally be obtained in the same way and from the same place/source from cases as well as controls, otherwise risk of bias
Data sources
Exposure
Existing data
registers
medical records
bio-banks
Questionnaires
interview
self-administered
Ad hoc measurements
clinical parametes
biological samples
Outcome
Registers
Clinical examination
Information from study subjects
interview
questionnaire
Information from next-of-kin
Mortality data
Register studies in DK
CPR Register
National Death Files
National Hospital Register
Birth Register
Prescription Databases
IDA Register(socioeconomic
variables)
Cancer Registry
Register studies
Registers are highly valuable data sources, BUT
Difficulties in interpretation due to incomplete data on competing risk factors
Life-style factors, socioeconomic factors, comorbidity, medical treatment
Other potential biases
Misclassification, non-compliance, etc.
02-03-2011
14
Issues in case-control studies
• Definition and selection of cases
• Selection of controls
• Ascertainment of disease and exposure status
• Boils down to chance, bias and confounding
Chance
• Random error will approach null when studysize increases
• Statistical issue
Bias – systematic errors
• Two main types (many subtypes defined…)
• Selection bias– Systematic differences between study participants and
non-participants
– Association between exposure and disease differs for those who participate and those who do not participate
• Information bias– Information collected about or from study subjects is
erroneous
Selection bias
• The common consequence of selection bias is that the association between exposure and outcome among those selected for analysis differs from the association among those eligible
Information bias
• Information collected in the study is erroneous
• Information is misclassified if the variable is measured on a categorical scale
• Both exposure and outcome (and confounders!) can be misclassified
• Two types– Non-differential
– Differential
Non-differential misclassification
• Exposure is non-differential misclassified if it is unrelated to the occurence of disease
• Disease is non-differential misclassified if it is unrelated to exposure
• Non-differential misclassification tends to dilute the effect
• This is always true for dichotomous variables
02-03-2011
15
Differential misclassification
• Exposure is misclassified depending onoutcome
• One very important type is recall bias in case-control studies
Recall bias
• Cases remember exposures better than controls
• Anders remember perfectly what he had for lunch at Hotel Hans Egede September 9 in Nuuk, Greenland. Why? He (and many others) got gastroenteritis
Recall bias 2
• Mothers who have given birth to a baby with a serious birth defect are thought to be able to recall accurately many exposures during earlypregnancy
• This will not be true for control mothers
Recall bias 3
• How can this be prevented in c-c studies?
– Frame the questions to aid accurate recall
– Take an entirely different control group that willnot be subject to incomplete recall (mothers to babies with other birth defects)
– Conduct a study that does not use interview information – e.g. use medical records beforebirth outcome was known
Differential misclassification
• May exagerate or underestimate an effect
• Therefore in general much more serious thannon-differential misclassification
Confounding
• Mixture of an effect of exposure on outcome with the effect of a third factor
• Presence of a factor which is predictor of outcome and associated with exposure
02-03-2011
16
Confounder control - principle
• Overall principle: only compare individuals with same level of confounding
or in other words
• Compare groups only different with respect to the variable in focus
• Control in design and analysis
Control in study design
• What are the possibilities?– Randomisation
• Example clinical trials (lottery)
• Adjusts for known and unknown confounding factors
– Restriction• Restriction of compared groups to certain levels of confounding
• Intuitive– compare sick children with healthy children, not with healthy adults
– Matching (individual)• Individul matching case/control on e.g. sex, age, neighbourhood
• Adjusts for unknown confounding factors
• Intuitively obvious and attractive design
Restriction
• Restriction of study groups to same categories or confounder categories
• Examples:
– Risk factors for fracture of femoral neck
• Age, sex
– Sun light and risk of malignant melanomas
• Age, sex, ethnicity
Restriction – pros and cons
• Advantages:– Simple and cheap
– Almost complete confounder control, if the range of confounders are limited
• Disadvantages:– May reduce the number of study persons – little
statistical strength
– Residual confounding (?), if restriction is not narrow enough
– Reduces generalisability
Matching (individual)
• Choice of identical (!) control for each case, match on confounders
• Must be considered both in design and analysis
• Attractive and often used, but some disadvantages
Matching – disadvantages
• Difficult, expensive and time consuming to find controls
• Example: By control for age (5 categories), sex and race (3 categories), how many combinations?– 30!– Even more difficult at 2:1, 3:1 and 4:1 matching
• No control of other confounders other than those matched for
• Less statistical strength if the matching variable is onlyweakly associated with exposusre or outcome
02-03-2011
17
Identifation of potential confounders –how to do it in practice?
• In design phase information on possible confounders
– All known risk factors in detail
– Sex and age as a minimum
• In analysis
– Stratified analysis
– Multivariate analysis
Conclusions – take home messages
• Well suited in case of rare diseases, long latency time and multiple risk factors
• Cheap and effective• Association measured in OR
– Interpretation depends on control selection
• Selection of control group challenging – controls should ideally be cases who just haven’t developed the disease (yet)
• Main error sources include chance, bias and confounding