epidemiology: basic concepts and study designs · · 2016-07-20epidemiology: basic concepts and...

Epidemiology: Basic concepts

and study designs

C.MagnaniCPO Piemonte and University of Eastern

Piedmont

Epidemiology

The study of the distribution of disease

and its determinants in man

(Mac Mahon and Pugh, 1970)

As regards the distribution of disease:

-> Descriptive epidemiology studies:

i.e. studies focused on measuring the

frequency of disease

(Disease Registries / Surveys)

Descriptive epidemiology studies:

simple two step design:

- identification of cases

- new diagnosis: incident cases

- present: prevalent cases

- measure frequency in the population

Prevalence and Incidence

Prevalence Prevalence is the number of cases of ais the number of cases of aspecific condition of interest present atspecific condition of interest present ata specific point in time (a proportion).a specific point in time (a proportion).

IncidenceIncidence: is the number of new cases: is the number of new casesof a specific condition of interestof a specific condition of interestoccurring during a time interval (a rate).occurring during a time interval (a rate).

Prevalence

Time PointHealthy

Symptomatic

DeathCross Sectional Survey

Time

Prevalent cases

Time PointHealthy

Symptomatic


Time

Prevalent case

Prevalence

• Prevalence is a proportion

e_time_prevalencAt_risk_on

casesPrevalent_=P

P= 3 / 7

Time PointHealthy

Symptomatic


Time

Prevalent case

Prevalence and Incidence

IncidenceIncidence: is the number of : is the number of newnew cases cases

of a specific condition of interestof a specific condition of interest

occurring during a time interval (a rate).occurring during a time interval (a rate).

Incident cases

Start (Study) End (Study) Healthy

Symptomatic

Death

Study Interval

Time

Incident case

Incidence: two measures

“Cumulative Incidence” or “Incidence proportion”

intervalstudy theentering Persons

asesIncident_cCum_Inc =

“Incidence density” or “Incidence rate”

=

persons

sktime_at_ri

asesIncident_cInc_rate

As regards the study of the distribution of disease

and its determinants

-> Analytical epidemiology studies:

Studies of the relationship between causes and

effects

i.e. of the determinants of a disease

Exposed

Unexposed

Experiments

Event

No event

Experimental disciplines

(e.g. toxicology)

Exposed

Unexposed

Observational disciplines

(Epidemiology)

Event

No event

If exposure and disease are

associated:

1. In populations with higher exposure

frequency, disease frequency is higher.

2. Subjects who experience exposure have a

higher probability of developing disease.

3. Subjects who developed disease have

higher probability of having been exposed.

Types Study Designs

Observational (epidemiological) studiesObservational (epidemiological) studies

Descriptive studies

Analytical studies

•Cohort studies

(prospectic)

(retrospective)

•Case-control studies

•Case-cohort studies

•Ecological•Nested case-control

(Density Sampling)

1

32

Hypotesis generation

The hypotheses tested with experimentalThe hypotheses tested with experimentalor observational analytical studies areor observational analytical studies are

generated through:generated through:

Case reportsCase reports

Case seriesCase series

Laboratory investigationsLaboratory investigations

Descriptive studiesDescriptive studies

Ecological studiesEcological studies

Analytical studiesAnalytical studies

Research question

Epidemiological study

testing the null hypothesisNot rejectedNot rejected

RejectedRejected

Dichotomous set of hypotheses

(null vs. alternative hypothesis)

Alternative hypothesis

provisionally accepted

The research questionThe research question

Types Study Designs


Descriptive studies

Analytical studies

•Cohort studies

(prospectic)

(retrospective)




(Density Sampling)

1

32

Ecological studies

1. In populations with higher exposure

frequency, disease frequency is higher.

Main aspects

The association between the exposureThe association between the exposureand the outcome is investigated at anand the outcome is investigated at anaggregate levelaggregate level

ex: country, hospital, town, ex: country, hospital, town, ……

Usually makes use of Usually makes use of routinely collectedroutinely collecteddatadata

ex: cancer registries, censuses,ex: cancer registries, censuses,surveillance programmessurveillance programmes……

An ecological studyTesticular cancer and maternal smoking

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 60 70

Prevalence of maternal smoking

Incid

en

ce o

f T

C in

th

e o

ffsp

rin

g

(* 1

00,0

00)

Sweden

Norway

Denmark

Finland

r = 0.93

Source: Pettersson A, et al. Int J Cancer 2004;109:941-44

Incidence of pleural mesothelioma in Casale, 1980-89,

occupationally exposed cases excluded.

Casale M. Surrounding Other

Men

Rate

(95% CI)

20

8.2

(4.3-12.2)

4

3.4

(0.0-8.0)

2

0.6

(0.0-1.6)

Women

Rate

(95% CI)

16

5.1

(2.4-7.8)

0

--

2

0.7

(0.0-1.9)

Magnani et al., 1995

Ecological study analysis

Comparison of disease occurrence

- Ratio of (incidence) rates

- Difference of (incidence) rates

Advantages and drawbacks

Cheap and quick study for generating hypothesesCheap and quick study for generating hypotheses

Weak design for testing causalityWeak design for testing causality

Powerful when comparing populations with large differences andPowerful when comparing populations with large differences andlimited internal variability (G. Rose)limited internal variability (G. Rose)

Almost no control of possible extraneous factors that may be the realAlmost no control of possible extraneous factors that may be the realcause of the observed difference (confounders)cause of the observed difference (confounders)

No information on the joint distribution of exposure and disease atNo information on the joint distribution of exposure and disease atindividual level.individual level.

(ecological fallacy i.e. inference about exposure and outcome

relationships at an individual level

cannot be drawn from aggregated information)

Types Study Designs


Descriptive studies

Analytical studies

•Cross sectional

•Cohort studies

(prospectic)

(retrospective)




(Density Sampling)

1

32

Key parameters

Unit of observationUnit of observation

individuals (individual level) vs.individuals (individual level) vs.

groups of individuals (aggregate level)groups of individuals (aggregate level)

Timing of observation of the exposureTiming of observation of the exposureand/or of the outcome:and/or of the outcome:

cross-sectional vs. longitudinalcross-sectional vs. longitudinal

Source populationSource population

study period and populationstudy period and population

If exposure and disease are associated, then:

1. Subjects who experience exposure have a

higher probability of developing disease


higher probability of having been exposed

Analytical Studies

Information is collected at Information is collected at individualindividual

levellevel

Information on the exposure and/or theInformation on the exposure and/or the

outcome refers to the outcome refers to the same point insame point in

timetime

Outcome is disease Outcome is disease prevalenceprevalence

Cross-sectional studies

An example of a prevalence study:

the ISAAC study

Estimate of the prevalence of Estimate of the prevalence of asthma, allergicasthma, allergicrhinoconjunctivitis, and actopic eczemarhinoconjunctivitis, and actopic eczema among among13-14 years old children in 155 centers worldwide13-14 years old children in 155 centers worldwide

Method of population sampling: Method of population sampling: schoolschool based based

Source of information: 3-page written Source of information: 3-page written questionnairequestionnaire,,video questionnaire (not compulsory)video questionnaire (not compulsory)

Response proportion: Response proportion: >80%>80% in most of the centres in most of the centres

Source: Lancet 1998;351:1225-32

12 Month Period Prevalence of AsthmaSymptoms in 13-14 Year Old Children

<5%

5 to <10%

10 to <20%

20%

Source. Neil Pearce. http://publichealth.massey.ac.nz/Purchasepublications.htm

Are centre-specific

estimates comparable?

Cross-sectional studies investigateinvestigate theassociationassociation between exposures and diseaseprevalence

For testing causality, the present exposureis used as a proxyproxy for the past exposure

It is not possible to disentanglenot possible to disentangle betweencauses of the disease and causes of theduration of the disease

Drawbacks

The disease may The disease may influence probability ofinfluence probability ofexposureexposure

The estimated prevalence may depends on theThe estimated prevalence may depends on thedefinition of the diseasedefinition of the disease

A A correct samplingcorrect sampling may be problematic (no may be problematic (nostudies on volunteers)studies on volunteers)

Non-responseNon-response may easily introduce bias may easily introduce bias

Types Study Designs


Descriptive studies

Analytical studies

•Cross sectional

•Cohort studies

(prospectic)

(retrospective)




(Density Sampling)

1

32

Cohort and Case-Control Studies investigateassociations between

EXPOSURESEXPOSURES and OUTCOMESOUTCOMES

with consideration of time sequence

Cohort Studies: Cohort Studies: participants to the study aresampled on the basis of their exposureexposure status andfollowed up to detected one or more specificoutcomesoutcomes

Case-control StudiesCase-control Studies: participants to the study aresampled on the basis of their outcomeoutcome status andback investigated to detected one or more specificexposureexposuress

Cohort studies

Cohort study (I)

Non Exposed

Exposed

timetime

Cohort study (II)

Non Exposed

timetime

Exposed

Outcome -

Outcome +

A cohort is “a designated group of person who area designated group of person who arefollowed or traced over a period of timefollowed or traced over a period of time” [Last JM.A dictionary of Epidemiology]

Members of the cohort are classified on the basis oftheir exposureexposure status.

The cohort is followedfollowed over time (follow-up period) tomeasure the occurenceoccurence of one or more outcomes

TimeTime is a key parameter in cohort studies

Main aspects

What is the exposure?

Examples of Exposures:

year of birth, age, employment status,employment site,smoking habits,genotype, etc...

“Exposure” is the independentindependent variable

within the causal pathway

“Exposure” is usually characterized by

intensityintensity, durationduration and dosedose

Some considerations

The general principles are straightforward:

Criteria defining a cohort must be clearly set out

(Who, Where, When)

Exposure status is assessed (dose, duration,

intensity)

Cohort is followed over time to identify and

measure the outcomes

Swiss Federal Railway personnel activelyemployed or retired in 1972-2002.

Selected occupations.

Set up in 1994; updated in 2003.

464,129 person-years from 20,141 persons.

Demographic characteristics

Period of employment

Occupation (at end of working period)

Outcome: cause specific mortality,

assessed through death certificate (recordlinkage)

• Exposure assessment by occupational

group and year based on measurement

and modelling

• 16.7 Hz AC current

From Populations to Samples

Target PopulationTarget Population

Study base

Sample

StudyStudy

participantsparticipants

The Epidemiological Study is undertaken focusing on the study participants

Results of the study are generalised to the target population

Closed and open cohorts

Closed cohortsClosed cohorts

Once the cohort is defined no one can be added

Open cohortOpen cohort

New members can join the cohort over time

Open

Closed

time

Cohort M

em

bers

Definition of the cohort: the study base

The study base is chosen according to theThe study base is chosen according to the

research question. Some examples:research question. Some examples:

Sample of the general populationsSample of the general populations it allows it allows

investigation of several exposures at the same time.investigation of several exposures at the same time.

Well-defined group of individualsWell-defined group of individuals, such as factory, such as factory

workers workers easy to identify and follow up. easy to identify and follow up.

Individuals with a high probability of being exposedIndividuals with a high probability of being exposed

and/or a high level of exposureand/or a high level of exposure investigation of rare investigation of rare

exposures.exposures.

Exposure assessment

Information on the exposure(s) should be

obtained from one or more sources:

Existing records (e.g. factory or medical records)Existing records (e.g. factory or medical records)

Environmental measurementsEnvironmental measurements

BiomarkersBiomarkers

Interview of the study subjects (interviews can beInterview of the study subjects (interviews can be

repeated if the exposure changes over time)repeated if the exposure changes over time)

Internal and external comparison group

The incidence of the outcome(s) of interest amongthe exposed subjectsexposed subjects should be compared compared withthe incidence observed in an appropriatecomparison group of unexposed subjectsunexposed subjects.

The unexposed subjects should be as similar asas similar aspossiblepossible to the exposed individuals for all factorsassociated with the outcome(s) except the exposureof interest

Internal comparisonInternal comparison the unexposed subjects areselected from the cohort

External comparisonExternal comparison the unexposed subjects areselected from individual not included in the cohort

Internal and external comparison group

Internal comparisonInternal comparison

Incidence among exposed vs. unexposed:

relative risk, incidence rate ratio, risk difference

External comparisonExternal comparison

The number of sex-, age-, and period-specific

observed events is compared with the number of

corresponding events expected applying the rates

of the whole population:

Standardized incidence ratio (SIR), standardized

mortality ratio (SMR)

Prospective and historical cohorts

end of f-u

end of f-u

Prospective

Historical

Measurement of the outcome(s)

In cohort studies it is usually possible to

measure multiple outcomes

Methods:

Existing surveillance systems: cancerExisting surveillance systems: cancer

registries, mortality statistics.registries, mortality statistics.

Ad-hoc surveillance systems: e.g. throughAd-hoc surveillance systems: e.g. through

questionnairesquestionnaires

A hypothetical cohort

1,000 exposed subjects

1,000 unexposed subjects

followed for 5 years

Exposed Unexposed

Events 150 60

Person-years 4625 4850

Rate 0.032 0.012

Rate ratio: 2.67

Advantages

The exposure is measured before theoutcome the quality and validity of theassessment is unlikely to be associated withthe disease status

Cohort studies are an efficient approach toinvestigate rare exposures and multipleoutcomes

The occurrence of the disease can bemeasured (both among exposed andunexposed subjects).

Drawbacks

Cohort studies are inefficient in the investigation ofrare diseases

They can be very expensive and time-consuming(with the exception of some historical cohorts)

Changes in the exposure over time are difficult toassess

Information on possible confounders is usually notavailable in historical cohorts

The knowledge of the subjects’ exposure statusmay influence the ascertainment of the outcome

Case-control studies

If exposure and disease are

associated:


higher probability of having been exposed

In case-control studies subjects with awith adiseasedisease (cases) and subjects without thatwithout thatdiseasedisease (controls) are selected from thesource populationsource population

The prevalence of exposureexposure is measuredand compared between cases and controls

The best way to understand case-controlstudies is to think at a cohort study…

Main aspects

From a hypothetical cohort...Cases

timetimenn

Non cases

Cases

Control

Sample

timetimenn

Non cases

Cases

Control

Sample

timetimenn

Not Exposed

Exposed

…to a case-control study

The observed events (or a random sampleThe observed events (or a random sample

of them) are the cases of the study. Theyof them) are the cases of the study. They

are classified as to whether they areare classified as to whether they are

exposed or unexposedexposed or unexposed

Controls are randomly sampled among non-Controls are randomly sampled among non-

cases in the source population. They arecases in the source population. They are

classified as to whether they are exposed orclassified as to whether they are exposed or

unexposedunexposed

Exposed Unexposed

Events 150 60


Rate 0.032 0.012

IRR: (150/4620) / (60/4850)=2.6

Exposed Unexposed

Events 150 60


Rate 0.032 0.012

IRR: (150/4620) / (60/4850)=2.6

Cases 150 60

Controls 462 485

Unexposed and exposed subjects have thesame sampling fraction (10%)sampling fraction (10%)

Exposed Unexposed

Events 150 60


Rate 0.032 0.012

IRR: (150/4620) / (60/4850)=2.6

Cases 150 60

Controls 462 485

OR: (150/462) / (60/465)=2.6

Unexposed and exposed subjects have thesame sampling fraction (10%)sampling fraction (10%)

Cases Controls

Exposed

Non

Exposed

210 947

Cases Controls

Exposed 150 462

Non

Exposed

60 485

210 947

Source population

The key issue is the definition of the source populationThe key issue is the definition of the source population

for the cases. There are two options:for the cases. There are two options:

The source population is an The source population is an identifiable populationidentifiable population,,

such as people living in a specific geographical area insuch as people living in a specific geographical area in

a specific period.a specific period.

The source population is determined by The source population is determined by the casethe case

selectionselection. If cases are identified in a single hospital. If cases are identified in a single hospital

the source population is all people who would havethe source population is all people who would have

attended the hospital had they had the disease.attended the hospital had they had the disease.

Cases:

• ALL age 0-14, inc. 1989-1994

• diagnosed in the network of the Children’sCancer Group

• resident in a list of 9 states

• as treatment for ALL is highly centralized,hospital based recruitment population based

Controls:

• random digit dialling ( population sampling)

• individually matched to cases by area, age, race

Selection of controlsCases and Controls should have the same ‘a priori’

probability of being exposed to the investigatedfactor

They belong to the same source population

Depending on the source population:

Population based cases -> Population controls

Hospital based cases -> Hospital controls

Source

populationCases Controls

Population controls

Population controls are sampled from theidentifiable population (sampling frame) onthe basis of:

Available registries, such as demographicAvailable registries, such as demographicregistries, electoral lists, GPregistries, electoral lists, GP’’s registress registres……

Random digit diallingRandom digit dialling

Neighbourhood controlsNeighbourhood controls

““best friendbest friend””, siblings,, siblings,……

Population controls

Population controls are sampled from theidentifiable population (sampling frame) onthe basis of:

Available registries, such as demographicAvailable registries, such as demographicregistries, electoral lists, GPregistries, electoral lists, GP’’s registress registres……

Random digit diallingRandom digit dialling

Neighbourhood controlsNeighbourhood controls

““best friendbest friend””, siblings,, siblings,……

Are these representative

samples of the target

population in respect to

exposure?

Selection bias in the NCI study (Hatch et al. 2000)

CONTROLS Full participation

Inteview refused

Single house 83% 70%

Income < 20000$ 12% 29%

Mother < higher edu. 38% 55%

Rented house 18% 35%

Single mother 10% 22%

Inner city 22% 30%

VHCC 6,3% 8,4%

They are likely to representrepresent the sourcepopulation for cases (with exceptions)

Characteristics of controls may be extrapolatedextrapolatedto the population

Response proportionResponse proportion may be low

The validity of the interviewsvalidity of the interviews, and thecompleteness of the information may be lowerthan for cases people are less motivated,different setting for the interview

Population controls

advantages & limitations

Hospital controls

Population controls are sampled from patientsadmitted at the same hospital(s) as the cases

The assumption is that controls represent thepopulation leaving in the catchment areacatchment area of thehospital for the disease under study

The response proportionresponse proportion is generally higher,the settingsetting for the interview is the same as forcases, the validityvalidity of the recall andcompletnesscompletness of the interviews are expected tobe similar for cases and controls

Biological samplesBiological samples can be easily obtained

Hospital controls limitations

Risk of selection biasselection bias:

Hospitalized individuals may have a different

exposure distribution than the source population

diagnoses related with the exposure shoulddiagnoses related with the exposure should

be excludedbe excluded; controls should be selected from a

variety of diagnoses.

The catchment areacatchment area for the same hospital may

vary among different diseasesvary among different diseases

The exposure under study may be related to the

likelihood of being hospitalized

M.Linet’s study.

Exposure assessment:

• Individual

• Measurement based (ELF-M)

• Wire coding

Advantages of case-control

studies

Case-control studies are an efficient

approach to investigate rare diseases

and multiple exposures

Usually they are less expensive and

less time-consuming than cohort

studies

Matching

• Matching is an option for controllingconfounders.

• Options:

– No matching

– Frequency matching

– Individual matching

• Close individual matching may beburdensome

Drawbacks of case control study

Selection bias may be introduced by the selection

of controls

Information is collected after diagnosis

it may be difficult to recall past exposures,

the recall may differ between cases and controls,

there is possibility of reverse causality

Case-control studies do not provide the estimate

of the incidence of the disease among those

exposed and those unexposed (Apart from

population-based studies where the sampling

fraction is known)

Other design options

• Case control studies within cohort studies

– Nested case control

– Case – cohort

• Case-only studies:

– Case-cross-over

– Case-specular

Nested case-control studies

A case-control study may be conductedwithinwithin a cohort the sourcethe sourcepopulation is explicitpopulation is explicit

CasesCases are the events occurring withinthe cohort during the follow-up period

ControlsControls are selected within the cohortamong non-cases

Main aspects

• At any time a case occurs, controls are

sampled from the corresponding risk set.

• Risk set at time t: those in the cohort at

time t and risk of becoming cases.

• Matching is possible but avoid matching

on variables associated to exposure (e.g.

year of hiring)

Advantages

Nested case-control studies have the

same advantages of cohort studiessame advantages of cohort studies

with regard to the validity of the

exposure information

They are also more efficientmore efficient

extra information is collected only for a

subset of the cohort.

expensive analyses (e.g. analyses on

blood samples) may be carried out,

Example of a nested case-control study

1972, Uganda: cohort cohort of 42 000 children:

serum sample, freezed and stored

1979: 16 casescases of Burkitt lymphoma

(end of follow up)

5 healthy controlscontrols for each cases

(matched by sex, age, geog. area, etc…)

De-The G. Epidemiol Rev. 1979;1:32-54

EBV serologyserology testing only on 16 + (16*5) sera!!!

The epidemiology of Burkitt's lymphoma:

evidence for a causal association

with Epstein-Barr virus

Limitations of nested case-control

studies

The main limitation is that multiple outcomes

cannot be investigated

Whenever an event occurs, controls are

selected from non-cases

The group of controls is therefore associated

with a specific outcome

epidemiology: basic concepts and study designs · · 2016-07-20epidemiology: basic concepts and...

Documents