summary of molecular cancer epidemiology epi243: molecular cancer epidemiology zuo-feng zhang,md,...
TRANSCRIPT
Summary of Molecular Cancer Epidemiology
EPI243: Molecular Cancer Epidemiology
Zuo-Feng Zhang,MD, PhD
Molecular Epidemiology
• The goal of molecular epidemiology is to supplement and integrate, not to replace, existing methods
• Molecular epidemiology can be utilized to enhance capacity of epidemiology to understand disease in terms of the interaction of the environment and heredity.
Molecular Epidemiology
• studies utilizing biological markers of exposure, disease and susceptibility
• studies which apply current and future generations of biomarkers in epidemiologic research.
Tasks for Molecular Epidemiologist
The major tasks are • to reduce misclassification of exposure, • to assess effect of exposure on the target tissue, • to measure susceptibility/inherited predisposition
to cancer, • to establish the link between environmental
exposures and gene mutations, • to assess gene-environment interaction. • To set up prevention/intervention strategies.
High Throughput Techniques
• Microarray technology– DNA chips
• cDNA array format• in situ synthesized oligonucleotide format (Affymetrix)
– Proteomics– Tissue arrays
• These are powerful tools and high through put methods to study gene expression, but they are not the answers themselves
• Individual targets/patterns identified need to be validated• In epidemiological studies, these methods can be used
to identify specific exposure induced molecular changes, individual risk assessments, etc.
An example of our 9000 gene mouse-arrays using differential expression analysis with Cy3 and Cy5 fluorescent dyes.
Proteomics• Examine protein level expression in a high throughput
manner • Used to identify protein markers/patterns associated with
disease/function• Different formats:
– SELDI-TOF (laser desorption ionization time-of-flight): the protein-chip arrays, the mass analyzer, and the data-analysis software
– 2D Page coupled with MALDI-TOF (matrix-assisted laser desorption ionization time-of-flight)
– Antibody based formats
A, GTE (20g/ml)M
W
(kD
a)pI
4.5 9.53.5 5.1 5.5 6.0 7.0 8.4217
30
37
98
55
20
116
3
4
12
5
6 7
8 9 10
3
4
12
5
6 7
8 9
10
11 13
12
11 13
1214
15
14
15
16 16 1818
17 17
48 hr
GTE: -Time: 48 hr
+24 hr
+
MW
(k
Da)
217
30
37
98
55
20
116
1110
1713
2019
5 1
13
18
17
10 15
12
15
16 12 1614
11
15
14
4
18
pIB, GTE (40g/ml)
4.5 9.53.5 5.1 5.5 6.0 7.0 8.4 4.5 9.53.5 5.1 5.5 6.0 7.0 8.4
4.5 9.53.5 5.1 5.5 6.0 7.0 8.4 4.5 9.53.5 5.1 5.5 6.0 7.0 8.4 4.5 9.53.5 5.1 5.5 6.0 7.0 8.4
Fig 1
Tissue Array• Provide a new high-throughput tool for the study of gene dosage
and protein expression patterns in a large number of individual tissues for rapid and comprehensive molecular profiling of cancer and other diseases, without exhausting limited tissue resources.
• A typical example of a tissue array application is in searching for oncogenes amplifications in vast tumor tissue panels. Large-scale studies involving tumors encompassing differing stages and grades of disease are necessary to more efficiently validate putative markers and ultimately correlate genotypes with phenotypes.
• Also applicable to any medical research discipline in which paraffin-embedded tissues are utilized, including structural, developmental, and metabolic studies.
Bladder Array
HE
Gelsolin
DNA Methylation
DNA methylation plays an important role in normal cellular processes, including X chromosome inactivation, imprinting control and transcriptional regulation of genes
It predominantly found on cytosine residues in CpG dinucleotide, CpG island, to producing 5-Methylcytosine
CpG islands frequently located in or around the transcription sites
Source:Royal Society of Chemistry
DNA Methylation (Cont’d)
Aberrant DNA methylation are one of the most common features of human neoplasia
Two major potential mechanisms for aberrant DNA
methylation in tumor carcinogenesis
Silencing tumor suppressor genes (e.g. p16 gene)
Point mutation: C to T transition
(e.g. P53 gene)
Promoter-Region Methylation
Promoter-region CpG islands methylation• Is rare in normal cells
• Occur virtually in every type of human neoplasm
• Associate with inappropriate transcriptional silence
• Early event in tumor progression
In tumor suppressor genes
Most of the tumor suppressor genes are under-methylated in normal cells but methylated in tumor cells. Methylation is often correlated with an decreasing level of gene
expression and can be found in premalignant lesions
DNA methyltransferases DNA methyltransferases
DNMTs catalyze the transfer of a methyl group (CH3) from S-
adenosylmethionine (SAM) to the carbon-5 position of cytosine producing the 5-methylcytosine
There are several DNA methyltransferases had been discovered, including DNMT1, 3a, and 3b
NORMAL CIN 1 CIN 2 CIN 3
NORMAL LGSIL HG SIL HGSIL
Cancer
Precancerous Intraepithelial Lesions, (PIN, CIN, PaIN..)
Birth
Genetic Suscep. Marker
Markers for Exposure
Markers ofEffect
Tumor Markers
Exposure to Carcinogen Additional Molecular Event
Surrogate End Point Markers
CHEMOPREVENTION
Case-Control Studies
• Disease end-point as a major interest• Clinical (Hospital)-based or population-based
case-control studies• Inclusion of both questionnaire data and
biological specimens • Biological markers can be measured and
compared between cases and controls when other variables can be used as either confounding factors or effect modifiers
Prospective Cohort Studies
• Exposure is measured before the outcome
• The source population is defined
• The participation rate is high if specimen are available for all subjects and follow-up is complete
Nested Case-Control Study
• The biomarker can be measured in specimens matched on storage duration
• The case-control set can be analyzed in the same laboratory batch, reducing the potential for bias introduced by sample degradation and laboratory drift
Case-Case Study Design
• Case-only, Case-series, etc.
• Studies with cases without using controls
• Can be employed to evaluate the etiological heterogeneity when studying tumor markers and exposure
• May be used to assess the statistical gene-environment or gene-gene interactions
Intervention Studies
• In studies of smoking cessation intervention, we can measure either serum cotinine or protein or DNA adducts (exposure) or p53 mutation, dysplasia and cell proliferation (intermediate markers for disease)
• Measure compliance with the intervention such as assaying serum -carotene in a randomized trial of -carotene.
Intervention Studies
Susceptibility markers (GSTM1) can also be used to determine whether the randomization is successful (comparable intervention and control arms)
Family Studies
• Does familial aggregation exist for a specific disease or characteristic?
• Is the aggregation due to genetic factors or environmental factors, or both?
• If a genetic component exists, how many genes are involved and what is their mode of inheritance?
• What is the physical location of these genes and what is their function?
Sample Size and Power
• False positive (alpha-level, or Type I error). The alpha-level used and accepted traditionally are 0.01 or 0.05. The smaller the level of alpha, the larger the sample size.
Power or Sample Size Estimate for Case-Control Studies
• Alpha-level (false positive): 0.05
• Beta-level (false negative level; 1-beta=power): 0.20
• Delta-level: Proportion of exposure in controls and exposure in cases or expected odds ratio
Interaction Assessment
Factor A
Absent Present
Factor A Absent RR00 RR01
Present RR10 RR11
Sample Size Consideration for Interaction Assessment
• Evaluation of interaction requires a substantial increase in study size. For example, in a case-control study involves comparing the sizes of the odds ratios (relating exposure and disease) in different strata of the effect modifier, rather than merely testing whether the overall odds ratio is different from the null value of 1.0.
Introduction
• Sample Collection, such as handling, labeling, processing, aliquoting, storage, and transportation, may affect the results of the study
• If case sample are handled differently from controls samples, differential misclassification may occur
Information linked to Sample
• Time and date of collection
• Recent diet and supplement use,
• Reproductive information (menstrual cycle)
• Recent smoking
• current medication use
• Recent medical illness
• Storage conditions
Quality Assurance
Systematic Application of optimum procedures to ensure valid, reproducible, and accurate results
-70 freezers
Types of Biospecimens: Blood
The use of skilled technicians and precise procedures when perform phlebotomy are important because painful, prolonged or repeated attempts at venepuncture can cause patient discomfort or injury and result in less than optimum quality or quantity of sample.
Types of Biospecimens: Blood
• Plasma
• Serum
• Lymphocytes
• Erythrocytes
• Platelets
Urine Collection
Urine is an ultrafiltrate of the plasma. It can be used to evaluate and monitor body metabolic disease process, exposure to xenobiotic agents, mutagenicity, exfoliated cells, DNA adducts, etc.
Tissue Collections
• Confirming clinical diagnosis by histological analysis
• Examining tumor characteristics at chromosome and molecular level
tissue
Laboratory Techniques with Tissue
RT-PCR
Adipose Tissue
• Adipose tissue may be quite feasible for subject and involve low risk. The tissue offers a relatively stable deposit of triglyceride and fat-soluble substances such as fat-soluble vitamins (vitamins A and D). It represents the greatest reservoir of carotenoids and reflect long-term dietary intake of essential fatty acids.
Bronchoalveolar Lavage (BAL)
• BAL is used to assess and quantify asbestos exposures
• Induced sputum sample and BALF can also provide sufficient DNA for PCR assays.
Exhaled Air
• To evaluate exposure to different substances, particularly solvents such as benzene, styrene
• To be used as a source of exposure and susceptibility markers (caffeine breath test for p4501A2 activity)
• Breath urea (presence of urease positive organisms such as H. pylori)
Hair
• Easy available biological tissue whose typical morphology may reflect disease conditions within the body
• Provides permanent record of trace elements associated with normal and abnormal metabolism
• A source for occupational and environmental exposure to toxic metals
Nail Clippings
• Toenail or fingernail clippings are obtained in a very easy and comfortable way.
• They do not require processing, storage and shipping condition and thus suitable for large epidemiological studies
Buccal cells
• No invasive
• Good for PCR-analysis
• Can measure both germline and somatic mutations
Saliva
• It is an efficient, painless and relatively inexpensive source of biological materials for certain assays
• It provides a useful tool for measuring endogenous and xenobiotic compounds
Breast Milk
• Measuring hormones, exposures to chemicals and biological contaminants (Aflatoxin), selenium levels
• Cells of interests
Feaces
• Certain cells of interest
• Infectious markers
• Oncogenes
Semen
• Evaluate the effects of exposures on endocrine and reproductive factors.
• Sexual abstinence for at least 2 days but not exceeding 7 days.
• Should reach the lab within one hour.
Storage
• Freezers may fail, leading to the necessity for 24 hour monitoring for the facility through a computerized alarm system to alter personnel and activate backup equipment.
• Monitoring fire, power loss, leakage, etc.
Shipping
• Sample shipping requirements depends on the time, distance, climate, season, method of transport, applicable regulations, type of specimen and markers to be assayed.
• Polyurethane boxes containing dye ice are used to ship and transport samples that require low temperature. For samples require very low temperature, liquid nitrogen container can be used
• The quantity of dry ice should be carefully calculated, based on estimated time of trip.
Safety
• Protect specimen from contamination
• Workers safety, HIV, HBV
Biomarker in Epidemiology: Biomarkers of Biological Agents• HPV DNA by PCR-based assays
HPV infection is often transient, especially in young women so that repeated sampling is required to assess persistent HPV infections
Biomarker in Epidemiology: Biomarkers of Biological Agents
HBV infection by serological assays.
• There are serological markers that distinguish between past and persistent infections. HBV DNA detection in sera further refines the assessment of exposure.
AFB1 AFB1-exo-8,9-epoxide
AFM1AFQ1AFB1-endo-8,9-epoxide
dietary intake
CYP3A4(CYP1A2
)
DNA-adducts
glutathione-AFB1 conjugate
AFB1-8,9-dihydrodiol
[phenolate resonance form]
protein adducts
excretion
excretion
GST-μ,(GST-θ)
+ glutathion
e
H2O(mEH)
CYPs
Background:Metabolism of aflatoxin B1
Main Effects of HBsAg, AFB1 levels, and IFNA17 on liver cancer development
Variables Case Control Crude Age & Sex Adjusted Fully Adjusted**
N (%) N (%) OR (95%CI) OR (95%CI) OR (95%CI)
HBsAg - 72 (35.3) 312 (75.4) 1 1 1
+ 132 (64.7) 102 (24.6) 5.61 (3.90-8.07) 5.21 (3.60-7.53) 5.68 (3.80-8.51)
AFB1 Mean (SD) 508.1 (328.7) 426.2 (250.4)
<247 33 (18.1) 94 (24.9) 1 1 1
247.1-388.8 46 (25.3) 94 (24.9) 1.39 (0.82-2.37) 1.38 (0.81-2.37) 1.15 (0.61-2.14)
388.9-545 42 (23.1) 95 (25.2) 1.26 (0.74-2.16) 1.27 (0.74-2.20) 1.19 (0.64-2.21)
>545.1 61 (33.5) 94 (24.9) 1.85 (1.11-3.08) 1.75 (1.04-2.94) 1.63 (0.90-2.96)
p(trend)=0.031 p(trend)=0.055 p(trend)=0.109
IFNA17 II 33 (17.4) 94 (24.5) 1 1 1
RI 104 (54.7) 193 (50.4) 1.54 (0.97-2.44) 1.49 (0.93-2.38) 1.67 (0.95-2.93)
RR 53 (27.9) 96 (25.1) 1.57 (0.94-2.64) 1.58 (0.93-2.68) 1.99 (1.06-3.73)
p(HW)=0.878 p(trend)=0.104 p(trend)=0.102 p(trend)=0.037
RI&RR 157 (82.6) 289 (75.5) 1.55 (1.00-2.41) 1.52 (0.97-2.38) 1.77 (1.04-3.03)
**Model includes age, sex, BMI, education, alcohol consumption, tobacco smoking, HBsAg, imputed AFB1 levels, anti-HCV
Interaction between HBV and AFB1 and IFNA17 HBsAg Case Control Crude Age & Sex Adjusted Fully Adjusted**
N (%) N (%) OR (95%CI) OR (95%CI) OR (95%CI)
AFB1
<247 - 12 (6.6) 69 (18.4) 1 1 1
247.1-388.8 - 19 (10.4) 67 (17.8) 1.63 (0.74-3.62) 1.64 (0.73-3.65) 1.72 (0.73-4.08)
388.9-545 - 15 (8.2) 71 (18.9) 1.22 (0.53-2.78) 1.22 (0.53-2.80) 1.34 (0.55-3.27)
>545.1 - 17 (9.3) 77 (20.5) 1.27 (0.57-2.85) 1.26 (0.56-2.82) 1.15 (0.48-2.74)
<247 + 21 (11.5) 25 (6.6) 4.83 (2.08-11.23) 4.61 (1.97-10.80) 6.43 (2.56-16.16)
247.1-388.8 + 27 (14.8) 27 (7.2) 5.75 (2.55-12.96) 5.30 (2.34-12.02) 4.68 (1.92-11.38)
388.9-545 + 27 (14.8) 24 (6.4) 6.47 (2.84-14.74) 6.20 (2.70-14.21) 6.65 (2.72-16.25)
>545.1 + 44 (24.2) 16 (4.3)15.82 (6.84-
36.57)13.75 (5.90-32.06)
16.72 (6.60-42.38)
1ORint (95%CI)= 0.73 (0.24-2.24) 0.70 (0.23-2.18) 0.42 (0.12-1.45)
2ORint (95%CI)= 1.10 (0.35-3.49) 1.10 (.35-3.52) 0.77 (0.22-2.70)
3ORint (95%CI)= 2.58 (0.82-8.12) 2.38 (0.75-7.55) 2.27 (0.65-7.92)
IFNA17
II - 13 (6.8) 66 (17.3) 1 1 1
RI&RR - 50 (26.3) 220 (57.6) 1.15 (0.59-2.25) 1.14 (0.58-2.23) 1.34 (0.64-2.82)
II + 20 (10.5) 27 (7.1) 3.76 (1.64-8.62) 3.49 (1.51-8.04) 3.99 (1.54-10.32)
RI&RR + 107 (56.3) 69 (18.1) 7.87 (4.04-15.34) 7.17 (3.66-14.06) 9.18 (4.34-19.43)
ORint (95%CI)= 1.81 (0.71-4.62) 1.81 (0.71-4.63) 1.71 (0.60-4.92)**Model includes age, sex, BMI, education, alcohol consumption, tobacco smoking, imputed AFB1 levels, anti-HCV; 1ORint for AFB1 (247.1-388.8 fmol/mg) and HBsAg; 2ORint for AFB1 (388.9-545 fmol/mg) and HBsAg; 3ORint for AFB1 >545.1 fmol/mg) and HBsAg
Interaction between HBsAg and IFNA17 stratified by AFB1
AFB1 HBsAg IFNA17 Case Control Crude Age & Sex Adjusted Fully Adjusted**
N N OR (95%CI) OR (95%CI) OR (95%CI)
<388.9 - II 8 26 1 1 1
- RI&RR 20 99 0.66 (0.26-1.66) 0.63 (0.24-1.62) 0.70 (0.24
+ II 9 13 2.25 (0.70-7.19) 2.04 (0.62-6.74) 2.07 (0.52-8.18)
+ RI&RR 37 37 3.25 (1.30-8.11) 2.81 (1.10-7.19) 3.45 (1.21-9.83)
ORint (95%CI)= 2.20 (0.58-8.38) 2.20 (0.56-8.70) 2.39 (0.50-11.45)
>388.9 - II 5 34 1 1 1
- RI&RR 25 104 1.63 (0.58-4.60) 1.62 (0.58-4.59) 2.09 (0.64-6.86)
+ II 11 9 8.31 (2.29-30.10) 8.07 (2.21-29.42) 9.22 (2.08-40.86)
+ RI&RR 57 27 14.35 (5.05-40.77) 13.88 (4.80-40.09) 21.80 (6.36-74.75)
ORint (95%CI)= 1.06 (0.25-4.44) 1.06 (0.25-4.45) 1.13 (0.22-5.81)
**Model includes age, sex, BMI, education, alcohol consumption, tobacco smoking, HCV
Biomarker of Dietary Intake
• Whether it is a good indicator of intake
• Whether it is a long- or short-term marker
• Whether there is a need for multiple measurements
• Whether it is acceptable for researcher and the subject
• Whether it is compatible with study design
Main component of green Tea Catechins: (-)-Epigallocatechin gallate ((-)EGCg)
PHIP DNA Adducts
Susceptibility Markers
• Susceptibility markers represent a group of biological markers, which may make an individual susceptible to cancer.
• These markers may be genetically inherited or determined or acquired.
• They are independent of environmental exposures.
Biomarker of Genetic Susceptibility
• High risk genes
• Low risk genes
Genetic Susceptibility to CancerGenetic Susceptibility to Cancer
•e.g. BRCA germline mutations
•Mutations with strong influence on risk •Variations with weak functional effect
•Rare in the population (<1%)•Low to high frequency in the population (1-50%)
•Results in familial clustering•Limited familial clustering
•Can be studied in families •Can be studied in populations
010205
McCarthy MI, Nature Review Genetics, 2008
If DNA damage not repaired
DNA damage repaired
If loose cell cycle control
Defected DNA repair gene
G
S
G2
M
P53
Cyclin D1
P16
Environmental Carcinogens / Procarcinogens Exposures
PAHs, Xenobiotics,
Arene, Alkine, etc
Active carcinogens Detoxified carcinogens
DNA Damage Normal cell
Carcinogenesis Programmed cell death
Tobacco consumption Occupational Exposures
Environmental Exposure
CYP1A1
GSTP1
mEH mEHNQO1
XRCC1
GSTM1
2-1. Background: Theoretical model of gene-gene/environmental interaction pathway
Ile105Val Ala114Val
Tyr113HisHis139Arg
Tyr113HisHis139Arg
Pro187Ser
MspIIle462Val
Arg194Trp, Arg399Gln, Arg280His
Null
Ala146ThrArg72Pro
G870A
BRCA2
BRCA1
BRCA1ATM CHEK2(RAD53
homologous recombination
Non-homologous Recombination
Damage recognition cell cycle delay
response (DRCCD )
Baseline characteristics of each studyLA Study Taixing City Study MSKCC study
Lung Cancer Cases (%)
UADT cancer Cases (%)
Controls (%)
Stomach Cancer Cases (%)
Esophageal Cancer Cases (%)
Liver Cancer Cases (%)
Controls (%)
Bladder Cancer Cases (%)
Controls (%)
Total 611 601 1040 206 218 204 415 233 204
Age range 32-59 20-59 17-65 30-82 30 – 84 22-83 21-84 32-84 17-80
Age, mean 52.2 50.3 49.9 61.5 60.6 53.8 57.7 64.8 42.0
Gender
Males 303 (49.6) 391 (74.2) 623 (59.9) 138 (67.0) 141 (64.7) 159 (77.9) 287 (69.2) 206 (83.4) 156 (77.2)
Females
308 (50.4) 136 (25.8) 417 (40.1) 68 (33.0) 77 (35.3) 45 (22.1) 128 (30.8) 41 (16.6) 46 (22.8)
Education
< High school
265 (43.4) 240 (45.5) 300 (28.9) 204 (99.5) 215 (100.0)
204 (100.0)
405 (97.6) 95 (40.8) 34 (16.7)
>High School
346 (56.6) 287 (54.5) 739 (71.1) 1 (0.5) 0 (0.0) 0 (0.0) 10 (2.4) 138 (59.2) 170 (83.3)
Smoking
Never 110 (18.0) 164 (31.1) 491 (47.3) 92 (45.8) 94 (43.1) 85 (44.3) 217 (52.4) 42 (17.3) 92 (46.0)
Ever 501 (82.0) 363 (68.9) 548 (52.7) 109 (54.2) 117 (53.7) 107 (55.7) 197 (47.9) 201 (82.7) 108 (54)
LA
Lung
UADT (squam)
Oroph.
Larynx
Naso.
Associations between 8q24 SNPs and smoking related cancers
Taixing
Esoph.
Stomach
Liver
MSKCC Bladder
Associations between 8q24 SNPs and smoking related cancers
Association between 8q24 and 7 smoking related cancer sites, stratified by smoking
status
TP53 Mutations in Bladder Cancer BP changes Reported,
n=200Current study
Transitions
GC AT 41.0% 37.5%
(at CpG) 14.0% 12.5%
ATGC 10.0% 15.0%
Transversions
GCTA 13.0% 12.5%
GCCG 19.0% 10.0%
ATTA 3.0% 0.0%
ATCG 2.0% 2.5%
Deletion/Insert. 12.0% 10.0%
Smoking and TP53 Mutations in Bladder Cancer
Smoking TP53+ TP53- OR 95%CI
No 8 24 1.00
Yes 58 83 6.27 1.29-30.2
Adjusted for age, gender, and education
Cigarettes/day and TP53 Mutations in Bladder Cancer
Cig/day TP53+ TP53- OR 95%CI
No 8 24 1.00
1-20 8 21 2.07 0.22-19.9
21-40 36 47 5.50 1.08-28.2
>40 17 18 10.4 1.90-56.8
Trend P=0.003
Adjusted for age, gender, and education
Years of Smoking and TP53 Mutations in Bladder Cancer
Years of smoking
TP53+ TP53- OR 95%CI
No 8 24 1.00
1-20 5 10 5.64 0.82-38.7
21-40 42 58 6.45 1.24-33.4
>40 14 18 6.20 1.17-32.8
Trend P=0.041Adjusted for age, gender and education
Association Studies of Genetic Factors
• 1st generation – Very small studies (<100 cases)– Usually not epidemiologic study design; 1-2 SNPs
• 2nd generation – Small studies (100-500 cases) – More epi focus; a few SNPs
• 3rd generation – Large molecular epi studies (>500 cases) – Proper epi design; pathways
• 4th generation– Consortium-based pooled analyses (>2000 cases)– GxE analyses
• 5th generation– Post-GWS studies
Boffeta, 2007
Issues in genetic association studies
• Many genes
– ~25,000 genes, many can be candidates
• Many SNPs
– ~12,000,000 SNPs, ability to predict functional SNPs is limited
• Methods to select SNPs:
– Only functional SNPs in a candidate gene
– Systematic screen of SNPs in a candidate gene
– Systematic screen of SNPs in an entire pathway
– Genomewide screen
– Systematic screen for all coding changes
Potential of GWAS
Kingsmore, 2008
Post-GWAS Epidemiology
• Functional SNP analysis
• Pathway-based analysis
• Deep sequencing and fine mapping
• Gene-Environmental Interaction