epidemiology: basic concepts and study designs · · 2016-07-20epidemiology: basic concepts and...
TRANSCRIPT
Epidemiology: Basic concepts
and study designs
C.MagnaniCPO Piemonte and University of Eastern
Piedmont
Epidemiology
The study of the distribution of disease
and its determinants in man
(Mac Mahon and Pugh, 1970)
As regards the distribution of disease:
-> Descriptive epidemiology studies:
i.e. studies focused on measuring the
frequency of disease
(Disease Registries / Surveys)
Descriptive epidemiology studies:
simple two step design:
- identification of cases
- new diagnosis: incident cases
- present: prevalent cases
- measure frequency in the population
Prevalence and Incidence
Prevalence Prevalence is the number of cases of ais the number of cases of aspecific condition of interest present atspecific condition of interest present ata specific point in time (a proportion).a specific point in time (a proportion).
IncidenceIncidence: is the number of new cases: is the number of new casesof a specific condition of interestof a specific condition of interestoccurring during a time interval (a rate).occurring during a time interval (a rate).
Prevalence
Time PointHealthy
Symptomatic
DeathCross Sectional Survey
Time
Prevalent cases
Time PointHealthy
Symptomatic
DeathCross Sectional Survey
Time
Prevalent case
Prevalence
• Prevalence is a proportion
e_time_prevalencAt_risk_on
casesPrevalent_=P
P= 3 / 7
Time PointHealthy
Symptomatic
DeathCross Sectional Survey
Time
Prevalent case
Prevalence and Incidence
IncidenceIncidence: is the number of : is the number of newnew cases cases
of a specific condition of interestof a specific condition of interest
occurring during a time interval (a rate).occurring during a time interval (a rate).
Incident cases
Start (Study) End (Study) Healthy
Symptomatic
Death
Study Interval
Time
Incident case
Incidence: two measures
“Cumulative Incidence” or “Incidence proportion”
intervalstudy theentering Persons
asesIncident_cCum_Inc =
“Incidence density” or “Incidence rate”
=
persons
sktime_at_ri
asesIncident_cInc_rate
As regards the study of the distribution of disease
and its determinants
-> Analytical epidemiology studies:
Studies of the relationship between causes and
effects
i.e. of the determinants of a disease
Exposed
Unexposed
Experiments
Event
No event
Experimental disciplines
(e.g. toxicology)
Exposed
Unexposed
Observational disciplines
(Epidemiology)
Event
No event
If exposure and disease are
associated:
1. In populations with higher exposure
frequency, disease frequency is higher.
2. Subjects who experience exposure have a
higher probability of developing disease.
3. Subjects who developed disease have
higher probability of having been exposed.
Types Study Designs
Observational (epidemiological) studiesObservational (epidemiological) studies
Descriptive studies
Analytical studies
•Cohort studies
(prospectic)
(retrospective)
•Case-control studies
•Case-cohort studies
•Ecological•Nested case-control
(Density Sampling)
1
32
Hypotesis generation
The hypotheses tested with experimentalThe hypotheses tested with experimentalor observational analytical studies areor observational analytical studies are
generated through:generated through:
Case reportsCase reports
Case seriesCase series
Laboratory investigationsLaboratory investigations
Descriptive studiesDescriptive studies
Ecological studiesEcological studies
Analytical studiesAnalytical studies
Research question
Epidemiological study
testing the null hypothesisNot rejectedNot rejected
RejectedRejected
Dichotomous set of hypotheses
(null vs. alternative hypothesis)
Alternative hypothesis
provisionally accepted
The research questionThe research question
Types Study Designs
Observational (epidemiological) studiesObservational (epidemiological) studies
Descriptive studies
Analytical studies
•Cohort studies
(prospectic)
(retrospective)
•Case-control studies
•Case-cohort studies
•Ecological•Nested case-control
(Density Sampling)
1
32
Ecological studies
1. In populations with higher exposure
frequency, disease frequency is higher.
Main aspects
The association between the exposureThe association between the exposureand the outcome is investigated at anand the outcome is investigated at anaggregate levelaggregate level
ex: country, hospital, town, ex: country, hospital, town, ……
Usually makes use of Usually makes use of routinely collectedroutinely collecteddatadata
ex: cancer registries, censuses,ex: cancer registries, censuses,surveillance programmessurveillance programmes……
An ecological studyTesticular cancer and maternal smoking
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60 70
Prevalence of maternal smoking
Incid
en
ce o
f T
C in
th
e o
ffsp
rin
g
(* 1
00,0
00)
Sweden
Norway
Denmark
Finland
r = 0.93
Source: Pettersson A, et al. Int J Cancer 2004;109:941-44
Incidence of pleural mesothelioma in Casale, 1980-89,
occupationally exposed cases excluded.
Casale M. Surrounding Other
Men
Rate
(95% CI)
20
8.2
(4.3-12.2)
4
3.4
(0.0-8.0)
2
0.6
(0.0-1.6)
Women
Rate
(95% CI)
16
5.1
(2.4-7.8)
0
--
2
0.7
(0.0-1.9)
Magnani et al., 1995
Ecological study analysis
Comparison of disease occurrence
- Ratio of (incidence) rates
- Difference of (incidence) rates
Advantages and drawbacks
Cheap and quick study for generating hypothesesCheap and quick study for generating hypotheses
Weak design for testing causalityWeak design for testing causality
Powerful when comparing populations with large differences andPowerful when comparing populations with large differences andlimited internal variability (G. Rose)limited internal variability (G. Rose)
Almost no control of possible extraneous factors that may be the realAlmost no control of possible extraneous factors that may be the realcause of the observed difference (confounders)cause of the observed difference (confounders)
No information on the joint distribution of exposure and disease atNo information on the joint distribution of exposure and disease atindividual level.individual level.
(ecological fallacy i.e. inference about exposure and outcome
relationships at an individual level
cannot be drawn from aggregated information)
Types Study Designs
Observational (epidemiological) studiesObservational (epidemiological) studies
Descriptive studies
Analytical studies
•Cross sectional
•Cohort studies
(prospectic)
(retrospective)
•Case-control studies
•Case-cohort studies
•Ecological•Nested case-control
(Density Sampling)
1
32
Key parameters
Unit of observationUnit of observation
individuals (individual level) vs.individuals (individual level) vs.
groups of individuals (aggregate level)groups of individuals (aggregate level)
Timing of observation of the exposureTiming of observation of the exposureand/or of the outcome:and/or of the outcome:
cross-sectional vs. longitudinalcross-sectional vs. longitudinal
Source populationSource population
study period and populationstudy period and population
If exposure and disease are associated, then:
1. Subjects who experience exposure have a
higher probability of developing disease
2. Subjects who developed disease have
higher probability of having been exposed
Analytical Studies
Information is collected at Information is collected at individualindividual
levellevel
Information on the exposure and/or theInformation on the exposure and/or the
outcome refers to the outcome refers to the same point insame point in
timetime
Outcome is disease Outcome is disease prevalenceprevalence
Cross-sectional studies
An example of a prevalence study:
the ISAAC study
Estimate of the prevalence of Estimate of the prevalence of asthma, allergicasthma, allergicrhinoconjunctivitis, and actopic eczemarhinoconjunctivitis, and actopic eczema among among13-14 years old children in 155 centers worldwide13-14 years old children in 155 centers worldwide
Method of population sampling: Method of population sampling: schoolschool based based
Source of information: 3-page written Source of information: 3-page written questionnairequestionnaire,,video questionnaire (not compulsory)video questionnaire (not compulsory)
Response proportion: Response proportion: >80%>80% in most of the centres in most of the centres
Source: Lancet 1998;351:1225-32
12 Month Period Prevalence of AsthmaSymptoms in 13-14 Year Old Children
<5%
5 to <10%
10 to <20%
20%
Source. Neil Pearce. http://publichealth.massey.ac.nz/Purchasepublications.htm
Are centre-specific
estimates comparable?
Cross-sectional studies investigateinvestigate theassociationassociation between exposures and diseaseprevalence
For testing causality, the present exposureis used as a proxyproxy for the past exposure
It is not possible to disentanglenot possible to disentangle betweencauses of the disease and causes of theduration of the disease
Drawbacks
The disease may The disease may influence probability ofinfluence probability ofexposureexposure
The estimated prevalence may depends on theThe estimated prevalence may depends on thedefinition of the diseasedefinition of the disease
A A correct samplingcorrect sampling may be problematic (no may be problematic (nostudies on volunteers)studies on volunteers)
Non-responseNon-response may easily introduce bias may easily introduce bias
Types Study Designs
Observational (epidemiological) studiesObservational (epidemiological) studies
Descriptive studies
Analytical studies
•Cross sectional
•Cohort studies
(prospectic)
(retrospective)
•Case-control studies
•Case-cohort studies
•Ecological•Nested case-control
(Density Sampling)
1
32
Cohort and Case-Control Studies investigateassociations between
EXPOSURESEXPOSURES and OUTCOMESOUTCOMES
with consideration of time sequence
Cohort Studies: Cohort Studies: participants to the study aresampled on the basis of their exposureexposure status andfollowed up to detected one or more specificoutcomesoutcomes
Case-control StudiesCase-control Studies: participants to the study aresampled on the basis of their outcomeoutcome status andback investigated to detected one or more specificexposureexposuress
Cohort studies
Cohort study (I)
Non Exposed
Exposed
timetime
Cohort study (II)
Non Exposed
timetime
Exposed
Outcome -
Outcome +
A cohort is “a designated group of person who area designated group of person who arefollowed or traced over a period of timefollowed or traced over a period of time” [Last JM.A dictionary of Epidemiology]
Members of the cohort are classified on the basis oftheir exposureexposure status.
The cohort is followedfollowed over time (follow-up period) tomeasure the occurenceoccurence of one or more outcomes
TimeTime is a key parameter in cohort studies
Main aspects
What is the exposure?
Examples of Exposures:
year of birth, age, employment status,employment site,smoking habits,genotype, etc...
“Exposure” is the independentindependent variable
within the causal pathway
“Exposure” is usually characterized by
intensityintensity, durationduration and dosedose
Some considerations
The general principles are straightforward:
Criteria defining a cohort must be clearly set out
(Who, Where, When)
Exposure status is assessed (dose, duration,
intensity)
Cohort is followed over time to identify and
measure the outcomes
Swiss Federal Railway personnel activelyemployed or retired in 1972-2002.
Selected occupations.
Set up in 1994; updated in 2003.
464,129 person-years from 20,141 persons.
Demographic characteristics
Period of employment
Occupation (at end of working period)
Outcome: cause specific mortality,
assessed through death certificate (recordlinkage)
• Exposure assessment by occupational
group and year based on measurement
and modelling
• 16.7 Hz AC current
From Populations to Samples
Target PopulationTarget Population
Study base
Sample
StudyStudy
participantsparticipants
The Epidemiological Study is undertaken focusing on the study participants
Results of the study are generalised to the target population
Closed and open cohorts
Closed cohortsClosed cohorts
Once the cohort is defined no one can be added
Open cohortOpen cohort
New members can join the cohort over time
Open
Closed
time
Cohort M
em
bers
Definition of the cohort: the study base
The study base is chosen according to theThe study base is chosen according to the
research question. Some examples:research question. Some examples:
Sample of the general populationsSample of the general populations it allows it allows
investigation of several exposures at the same time.investigation of several exposures at the same time.
Well-defined group of individualsWell-defined group of individuals, such as factory, such as factory
workers workers easy to identify and follow up. easy to identify and follow up.
Individuals with a high probability of being exposedIndividuals with a high probability of being exposed
and/or a high level of exposureand/or a high level of exposure investigation of rare investigation of rare
exposures.exposures.
Exposure assessment
Information on the exposure(s) should be
obtained from one or more sources:
Existing records (e.g. factory or medical records)Existing records (e.g. factory or medical records)
Environmental measurementsEnvironmental measurements
BiomarkersBiomarkers
Interview of the study subjects (interviews can beInterview of the study subjects (interviews can be
repeated if the exposure changes over time)repeated if the exposure changes over time)
Internal and external comparison group
The incidence of the outcome(s) of interest amongthe exposed subjectsexposed subjects should be compared compared withthe incidence observed in an appropriatecomparison group of unexposed subjectsunexposed subjects.
The unexposed subjects should be as similar asas similar aspossiblepossible to the exposed individuals for all factorsassociated with the outcome(s) except the exposureof interest
Internal comparisonInternal comparison the unexposed subjects areselected from the cohort
External comparisonExternal comparison the unexposed subjects areselected from individual not included in the cohort
Internal and external comparison group
Internal comparisonInternal comparison
Incidence among exposed vs. unexposed:
relative risk, incidence rate ratio, risk difference
External comparisonExternal comparison
The number of sex-, age-, and period-specific
observed events is compared with the number of
corresponding events expected applying the rates
of the whole population:
Standardized incidence ratio (SIR), standardized
mortality ratio (SMR)
Prospective and historical cohorts
end of f-u
end of f-u
Prospective
Historical
Measurement of the outcome(s)
In cohort studies it is usually possible to
measure multiple outcomes
Methods:
Existing surveillance systems: cancerExisting surveillance systems: cancer
registries, mortality statistics.registries, mortality statistics.
Ad-hoc surveillance systems: e.g. throughAd-hoc surveillance systems: e.g. through
questionnairesquestionnaires
A hypothetical cohort
1,000 exposed subjects
1,000 unexposed subjects
followed for 5 years
Exposed Unexposed
Events 150 60
Person-years 4625 4850
Rate 0.032 0.012
Rate ratio: 2.67
Advantages
The exposure is measured before theoutcome the quality and validity of theassessment is unlikely to be associated withthe disease status
Cohort studies are an efficient approach toinvestigate rare exposures and multipleoutcomes
The occurrence of the disease can bemeasured (both among exposed andunexposed subjects).
Drawbacks
Cohort studies are inefficient in the investigation ofrare diseases
They can be very expensive and time-consuming(with the exception of some historical cohorts)
Changes in the exposure over time are difficult toassess
Information on possible confounders is usually notavailable in historical cohorts
The knowledge of the subjects’ exposure statusmay influence the ascertainment of the outcome
Case-control studies
If exposure and disease are
associated:
1. Subjects who developed disease have
higher probability of having been exposed
In case-control studies subjects with awith adiseasedisease (cases) and subjects without thatwithout thatdiseasedisease (controls) are selected from thesource populationsource population
The prevalence of exposureexposure is measuredand compared between cases and controls
The best way to understand case-controlstudies is to think at a cohort study…
Main aspects
From a hypothetical cohort...Cases
timetimenn
Non cases
Cases
Control
Sample
timetimenn
Non cases
Cases
Control
Sample
timetimenn
Not Exposed
Exposed
…to a case-control study
The observed events (or a random sampleThe observed events (or a random sample
of them) are the cases of the study. Theyof them) are the cases of the study. They
are classified as to whether they areare classified as to whether they are
exposed or unexposedexposed or unexposed
Controls are randomly sampled among non-Controls are randomly sampled among non-
cases in the source population. They arecases in the source population. They are
classified as to whether they are exposed orclassified as to whether they are exposed or
unexposedunexposed
Exposed Unexposed
Events 150 60
Person-years 4620 4850
Rate 0.032 0.012
IRR: (150/4620) / (60/4850)=2.6
Exposed Unexposed
Events 150 60
Person-years 4620 4850
Rate 0.032 0.012
IRR: (150/4620) / (60/4850)=2.6
Cases 150 60
Controls 462 485
Unexposed and exposed subjects have thesame sampling fraction (10%)sampling fraction (10%)
Exposed Unexposed
Events 150 60
Person-years 4620 4850
Rate 0.032 0.012
IRR: (150/4620) / (60/4850)=2.6
Cases 150 60
Controls 462 485
OR: (150/462) / (60/465)=2.6
Unexposed and exposed subjects have thesame sampling fraction (10%)sampling fraction (10%)
Cases Controls
Exposed
Non
Exposed
210 947
Cases Controls
Exposed 150 462
Non
Exposed
60 485
210 947
Source population
The key issue is the definition of the source populationThe key issue is the definition of the source population
for the cases. There are two options:for the cases. There are two options:
The source population is an The source population is an identifiable populationidentifiable population,,
such as people living in a specific geographical area insuch as people living in a specific geographical area in
a specific period.a specific period.
The source population is determined by The source population is determined by the casethe case
selectionselection. If cases are identified in a single hospital. If cases are identified in a single hospital
the source population is all people who would havethe source population is all people who would have
attended the hospital had they had the disease.attended the hospital had they had the disease.
Cases:
• ALL age 0-14, inc. 1989-1994
• diagnosed in the network of the Children’sCancer Group
• resident in a list of 9 states
• as treatment for ALL is highly centralized,hospital based recruitment population based
Controls:
• random digit dialling ( population sampling)
• individually matched to cases by area, age, race
Selection of controlsCases and Controls should have the same ‘a priori’
probability of being exposed to the investigatedfactor
They belong to the same source population
Depending on the source population:
Population based cases -> Population controls
Hospital based cases -> Hospital controls
Source
populationCases Controls
Population controls
Population controls are sampled from theidentifiable population (sampling frame) onthe basis of:
Available registries, such as demographicAvailable registries, such as demographicregistries, electoral lists, GPregistries, electoral lists, GP’’s registress registres……
Random digit diallingRandom digit dialling
Neighbourhood controlsNeighbourhood controls
““best friendbest friend””, siblings,, siblings,……
Population controls
Population controls are sampled from theidentifiable population (sampling frame) onthe basis of:
Available registries, such as demographicAvailable registries, such as demographicregistries, electoral lists, GPregistries, electoral lists, GP’’s registress registres……
Random digit diallingRandom digit dialling
Neighbourhood controlsNeighbourhood controls
““best friendbest friend””, siblings,, siblings,……
Are these representative
samples of the target
population in respect to
exposure?
Selection bias in the NCI study (Hatch et al. 2000)
CONTROLS Full participation
Inteview refused
Single house 83% 70%
Income < 20000$ 12% 29%
Mother < higher edu. 38% 55%
Rented house 18% 35%
Single mother 10% 22%
Inner city 22% 30%
VHCC 6,3% 8,4%
They are likely to representrepresent the sourcepopulation for cases (with exceptions)
Characteristics of controls may be extrapolatedextrapolatedto the population
Response proportionResponse proportion may be low
The validity of the interviewsvalidity of the interviews, and thecompleteness of the information may be lowerthan for cases people are less motivated,different setting for the interview
Population controls
advantages & limitations
Hospital controls
Population controls are sampled from patientsadmitted at the same hospital(s) as the cases
The assumption is that controls represent thepopulation leaving in the catchment areacatchment area of thehospital for the disease under study
The response proportionresponse proportion is generally higher,the settingsetting for the interview is the same as forcases, the validityvalidity of the recall andcompletnesscompletness of the interviews are expected tobe similar for cases and controls
Biological samplesBiological samples can be easily obtained
Hospital controls limitations
Risk of selection biasselection bias:
Hospitalized individuals may have a different
exposure distribution than the source population
diagnoses related with the exposure shoulddiagnoses related with the exposure should
be excludedbe excluded; controls should be selected from a
variety of diagnoses.
The catchment areacatchment area for the same hospital may
vary among different diseasesvary among different diseases
The exposure under study may be related to the
likelihood of being hospitalized
M.Linet’s study.
Exposure assessment:
• Individual
• Measurement based (ELF-M)
• Wire coding
Advantages of case-control
studies
Case-control studies are an efficient
approach to investigate rare diseases
and multiple exposures
Usually they are less expensive and
less time-consuming than cohort
studies
Matching
• Matching is an option for controllingconfounders.
• Options:
– No matching
– Frequency matching
– Individual matching
• Close individual matching may beburdensome
Drawbacks of case control study
Selection bias may be introduced by the selection
of controls
Information is collected after diagnosis
it may be difficult to recall past exposures,
the recall may differ between cases and controls,
there is possibility of reverse causality
Case-control studies do not provide the estimate
of the incidence of the disease among those
exposed and those unexposed (Apart from
population-based studies where the sampling
fraction is known)
Other design options
• Case control studies within cohort studies
– Nested case control
– Case – cohort
• Case-only studies:
– Case-cross-over
– Case-specular
Nested case-control studies
A case-control study may be conductedwithinwithin a cohort the sourcethe sourcepopulation is explicitpopulation is explicit
CasesCases are the events occurring withinthe cohort during the follow-up period
ControlsControls are selected within the cohortamong non-cases
Main aspects
• At any time a case occurs, controls are
sampled from the corresponding risk set.
• Risk set at time t: those in the cohort at
time t and risk of becoming cases.
• Matching is possible but avoid matching
on variables associated to exposure (e.g.
year of hiring)
Advantages
Nested case-control studies have the
same advantages of cohort studiessame advantages of cohort studies
with regard to the validity of the
exposure information
They are also more efficientmore efficient
extra information is collected only for a
subset of the cohort.
expensive analyses (e.g. analyses on
blood samples) may be carried out,
Example of a nested case-control study
1972, Uganda: cohort cohort of 42 000 children:
serum sample, freezed and stored
1979: 16 casescases of Burkitt lymphoma
(end of follow up)
5 healthy controlscontrols for each cases
(matched by sex, age, geog. area, etc…)
De-The G. Epidemiol Rev. 1979;1:32-54
EBV serologyserology testing only on 16 + (16*5) sera!!!
The epidemiology of Burkitt's lymphoma:
evidence for a causal association
with Epstein-Barr virus
Limitations of nested case-control
studies
The main limitation is that multiple outcomes
cannot be investigated
Whenever an event occurs, controls are
selected from non-cases
The group of controls is therefore associated
with a specific outcome