day2 145pm crawford
TRANSCRIPT
![Page 1: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/1.jpg)
Association Analysis
University of LouisvilleUniversity of LouisvilleCenter for Genetics and Molecular MedicineCenter for Genetics and Molecular Medicine
January 11, 2008January 11, 2008
Dana Crawford, PhDDana Crawford, PhDVanderbilt UniversityVanderbilt University
Center for Human Genetics ResearchCenter for Human Genetics Research
![Page 2: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/2.jpg)
Association Analysis Outline
• Study Design• SNPs versus Haplotypes• Analysis Methods• Candidate Gene• Whole Genome Analysis• Replication and Function
![Page 3: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/3.jpg)
Study Design
Does your trait or phenotype have a genetic component?
• Segregation analysis
• Recurrence risks
• Heritability
• Other sources of evidence for a geneticcomponent
![Page 4: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/4.jpg)
Classic Segregation Analysis
• Determines if a major gene is involved
• Compares data to Mendelian models, such asAutosomal dominantAutosomal recessiveX-linked
• Results can be used as parameters forlinkage analysis (e.g. parametric LOD)
• Subject to ascertainment bias
Note: More complex methods needed for complex traits
![Page 5: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/5.jpg)
Recurrence Risks
The chance that a disease present in thefamily will recur in that family
“Lightning striking twice”
If recurrence risk is greater in the familycompared with unrelated individuals,
the disease has a “genetic” component
Suggests familial aggregation
![Page 6: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/6.jpg)
Recurrence Risks
Measured using the risk ratio (λ)
Sibling risk ratio = λs
λs = sibling recurrence risk population prevalence
Cystic fibrosis λs = (0.25/0.0004) = 500
Huntington disease λs = (0.50/0.0001) = 5000
![Page 7: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/7.jpg)
Recurrence Risks: Complex traits
λ here is for first degree relative
Merikangas and Risch (2003) Science 302:599-601.
![Page 8: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/8.jpg)
Heritability
Think “twin studies”
The proportion of phenotypic variation in a population attributable to genetic variation
Quantitative traits
Heritability measured as h2
(Can also be family studies)
![Page 9: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/9.jpg)
Heritability and Quantitative Traits
Determined by genes and environment
Boys Girls
Mexican Americans
Blacks
Whites
Mexican Americans
Blacks
Whites
Example: Height
NHANES 1971-1974 versus NHANES 1999-2002
Freedman et al (2006) Obesity 14:301-308
![Page 10: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/10.jpg)
Heritability and Quantitative Traits
Trait variation = genetic + environment
Genetic variation = additive + dominant
σT2 = σG
2 + σE2
σG2 = σa
2 + σd2
σE2 = σf
2 + σe2 Environmental variation =
familial/household + random/individual
hB2= σG
2 / σT2 Broad Sense heritability
Narrow Sense heritabilityhN2= σa
2 / σT2
![Page 11: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/11.jpg)
Heritability and Twins Studies
h2 = 2(rMZ – rDZ),
where r is the correlation coefficient
Monozygotic = same genetic material = r ~ 100%
Dizygotic = half genetic material = r ~ 50%
![Page 12: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/12.jpg)
Heritability and Twins Studies
Trait r(MZ) r(DZ) Reference
Cholesterol 0.76 0.39 Fenger et al
SBP 0.60 0.32 Evans et al
BMI 0.67 0.32 Schousboe et al
Perceived pitch 0.67 0.44 Drayna et al
![Page 13: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/13.jpg)
Heritability: Is everything genetic?
Trait r(MZ) r(DZ) Reference
Vote choice 0.81 0.69 Hatemi et al
Religiousness 0.62 0.42 Koenig et al
![Page 14: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/14.jpg)
Other Evidence For A Genetic Component
Monogenic disorders
Example:Phenotype of interest is sensitivity to warfarindosing, but there are no heritability estimates
Solution:Rare, familial disorder of warfarin resistance
![Page 15: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/15.jpg)
Other Evidence For A Genetic Component
Case Reports
Example:Phenotype of interest is susceptibility toNeisseria meningitidis (prevalence: 1/100,000)
Solution:Case report of recurrent N. meningitidis inpatient
![Page 16: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/16.jpg)
Other Evidence For A Genetic Component
• Animal models
• Biochemistry or biological pathways
• Expression data
• Previous genetic association studies
Other good arguments…
![Page 17: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/17.jpg)
Study DesignHow well can you diagnose the disease or measure the trait?
• Narrow definitions better than all-inclusive definitionsThere are many paths that lead to the samephenotype
• Avoid misclassification and measurement errorDirect measurement versus recall/survey data or indirect proxies
• Be aware of age of onsetCan your control become a case over time?
Arguably most important step in study design
![Page 18: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/18.jpg)
Target PhenotypesDisease or Quantitative trait?
Carlson et al. (2004) Nature 429:446-452
MI
CRP
LDL-C
IL6
LDLR
Acute Illness
Diet
Note: SNPs associated with quantitative traits may not be associated with clinical endpoint
![Page 19: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/19.jpg)
Study Design
How many cases and controls will you need to detect an association?
Statistical Power• Null hypothesis: all alleles are equal risk
• Given that a risk allele exists, how likely is a study to reject the null?
• Study sample size ideally determined before you begin to recruit and genotype
![Page 20: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/20.jpg)
• Statistical significance– Significance = p(false positive)– Traditional threshold 5%
• Statistical power– Power = 1- p(false negative)– Traditional threshold 80%
• Traditional thresholds balance confidence in results against reasonable sample size
Study DesignWhat are the thresholds/variables in a general power calculation?
Note: Significance threshold for 1 SNP tested
![Page 21: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/21.jpg)
Study Design
Power Calculation Resources
• Quanto (hydra.usc.edu/gxe/)Supports quantitative, discrete traits (unrelated
and family based)
• Genetic Power Calculator (pngu.mgh.arvard.edu/~purcell/gpc/)
Supports discrete traits, variance components, quantitative traits for linkage and association studies
(List of other software: linkage.rockefeller.edu/soft/)
![Page 22: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/22.jpg)
Study DesignHow can you maximize power for your study?
• Large sample sizeBetter estimate of variability or riskChance of misclassification / measurement error
• Large genetic effect sizeSNP risk allele with large odds ratio or explains a lot of trait variance
This is unknown at beginning of study
• Risk SNP is commonThis is unknown at beginning of studyCalculate power for a range of common MAFs (5-45%)
• Genotype the risk SNP directlyRisk SNP is unknown at beginning of studyRemember tagSNPs are imperfect proxiesAdjust sample size by 1/r2
![Page 23: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/23.jpg)
Study Design
0
20
40
60
80
100
120
140
160
22.2 2.4 2.6 2.8
33.2 3.4 3.6 3.8
44.2 4.4 4.6 4.8
55.2 5.4 5.6 5.8
6
Genotype relative risk
(Additive model)
Sample size (cases)
0.05
0.1
0.15
0.2
0.25
Calculated using Quanto 1.1.1
MAF
Power calculation example:Cases: Adverse reaction (wheezing) to flu vaccinationControls: Vaccinated children with no adverse reactions
![Page 24: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/24.jpg)
Study Design
Power calculation example:Immunogenicity to influenza A (H5N1) vaccine
0
100
200
300
400
500
600
700
800
900
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25 0.28 0.31 0.34 0.37 0.4 0.43 0.46 0.49
R2
(Additive model)
Sample size
Calculated using Quanto 1.1.1
![Page 25: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/25.jpg)
Study DesignWhy are you considering an association study instead of linkage?
• Linkage analysis is powerful for disorders with– Discernable pattern of inheritance– Rare alleles w/ large genetic effect sizes– High penetrance
• Not powerful for disorders that– have complex pattern of inheritance – are common– many risk alleles with small effect sizes– have low penetrance
![Page 26: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/26.jpg)
Common variant/common disease hypothesis
• Common genetic variants confer susceptibility
• Risk-conferring alleles ancient; common across mostpopulations
• Risk-conferring allele has small effect
• Multiple risk alleles expected for common disease; also environment
Study Design
![Page 27: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/27.jpg)
Study Design
Should you design a candidate gene or whole genome study?
• Candidate gene association study– Interrogate specific genes or regions– Based on previous knowledge or
biological plausibility– Hypothesis testing
• Whole genome association study– Interrogate the “entire” genome– No previous knowledge required– Hypothesis generation
![Page 28: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/28.jpg)
Candidate gene association studies
• Choose gene based on previous knowledge– Gene function– Biological pathway– Previous linkage or association study
• Choose DNA variations for genotyping– Direct association approach– Indirect association approach
![Page 29: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/29.jpg)
Direct Candidate Gene Association Study
Genotype “functional” SNPs
Collins et al (1997) Science 278:1580-1581
Example: Nonsynonymous SNPs
![Page 30: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/30.jpg)
Direct Candidate Gene Association Study
Botstein and Risch (2003) Nat Genet 33 Suppl:228-37.
Problem: We don’t know what is functionaland what is not functional
![Page 31: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/31.jpg)
Direct Candidate Gene Association Study
What would we miss?
Functional synonymous SNPs in MDR1 alterP-glycoprotein activity
Komar (2007) Science 315:466-467
![Page 32: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/32.jpg)
Direct Candidate Gene Association Study
What would we miss?
• 99% human genome is non-coding
• Non-coding SNPs or DNA variations in– Introns– Intergenic regulatory regions
![Page 33: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/33.jpg)
Indirect Candidate Gene Association Study
• Genotype a fraction of all SNPs regardless of “function”
• Rely on SNP-SNP correlations (linkage disequilibrium) to capture information for SNPs not genotyped
Kruglyak (2005) Nat Genet 37:1299-1300
![Page 34: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/34.jpg)
Indirect Candidate Gene Association Study
Linkage disequilibrium (LD)
Measured by r2
r2 = [f(A1B1) – f(A1)f(B1)]2
f(A1)f(A2)f(B1)f(B2)
r2 = 0 SNPs are independentr2 = 1 SNPs are perfectly correlated AND
have the same minor allele frequency
![Page 35: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/35.jpg)
Indirect Candidate Gene Association Study
Using LD to pick “tagSNPs”
CRPEuropean-descent10 SNPs >5% MAF
CRPEuropean-descent
4 tagSNPs
r2>0.80
![Page 36: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/36.jpg)
Indirect Candidate Gene Association Study
“tagSNPs” are population specific
CRPEuropean-descent
4 tagSNPs
CRPAfrican-descent
10 tagSNPs
![Page 37: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/37.jpg)
Indirect Candidate Gene Association Study
• “tagSNPs” are population specific
• Merge sets for “cosmopolitan” set
http://gvs.gs.washington.edu/GVS/
![Page 38: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/38.jpg)
Indirect Candidate Gene Association Study
Multiple testing
• Testing many SNPs for association with disease status
• No consensus on correcting p-value– Bonferroni– False Discovery Rate
• Need to replicate findings in independent study
![Page 39: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/39.jpg)
Indirect Candidate Gene Association Study: Pros and Cons
• Can interrogate all common SNPs in gene
• SNPs must be known and genotypes available to calculate LD and pick tagSNPs
• Multiple testing within a gene
• Limited to previous knowledge
![Page 40: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/40.jpg)
Whole Genome Association Study
• Can now genotype 100K – 1 million SNPs
• Coverage depends on platform and chip– tagSNPs capturing HapMap common SNPs– Genic SNPs overrepresented– Conserved non-coding SNPs represented– Evenly spaced across genome
Illumina Infinium assay Affymetrix GeneChips
![Page 41: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/41.jpg)
Whole Genome Association Study
• Same study design and challenges as candidate gene
– Mostly case-control (retrospective)– Multiple testing
• Data storage and higher-order interaction testing issues
• Hypothesis generation tool (replication)
![Page 42: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/42.jpg)
Manolio et al. Nature Reviews Genetics 7, 812–820 (October 2006)
Case/Control Study DesignsFor either candidate gene or whole genome
![Page 43: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/43.jpg)
Study Pros Cons
Case/Control Easier to collect Subject to bias Less expensive No risk estimates
Case/Control Study Designs: Pros and Cons
Prospective Risk estimates Harder to collect More expensive Subject to bias
For rare outcomes, case/control design may be only option
![Page 44: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/44.jpg)
Case/Control Study Designs: Pros and Cons
Types of bias• Bias in selection of cases
Those that are currently livingMiss fatal or short episodes of diseaseMight miss mild diseasesReferral/admission bias
• Non-response bias• Exposure suspicion bias• Family information bias• Recall bias
Manolio et al. Nature Reviews Genetics 7, 812–820 (October 2006)
Often ignored in genetic association studies
![Page 45: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/45.jpg)
Analysis Methods
Genotype QC
• Test for departures of Hardy-Weinberg Equilibrium
• Test for gender inconsistencies
• Eliminate very rare SNPs (no power)
• Eliminate SNPs with low genotyping efficiency
• Eliminate samples with low genotyping efficiency
![Page 46: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/46.jpg)
Analysis Methods
What statistical methods do you use to analyze your data?
• SNP by SNP (borrowed from epidemiology)Chi-square and Fisher’s exact
2x2 table2x3 table
Logistic and linear regressionCovariates
• HaplotypesHaplo.stats and regression
• InteractionsTraditional regressionMDR (Ritchie et al)
![Page 47: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/47.jpg)
Analysis Methods
Case Control
Minor allele A B
Major allele C D
Odds ratio (OR) = ratio of odds of minor allele in Cases (A/C) and Controls (B/D)
OR(A*D)/(B*C)
The Case/Control Study
![Page 48: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/48.jpg)
Case Control
Aa A B
AA C D
For genotypes, set homozygous for major allele (A) as “referent” genotype, and calculate 2 odds ratios:
Case Control
aa A B
AA C D
Analysis Methods
![Page 49: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/49.jpg)
Analysis Methods
Case/control:Interpretation of Odds Ratio
1.0 – Referent>1.0 – Greater odds of disease compared with controls<1.0 – Lesser odds of disease compared with controls
Confidence Intervals: probably contain true OR
OR does not measure risk*
![Page 50: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/50.jpg)
Prospective cohort
• Disease free at beginning of study
• Followed over time for disease (“incident”)
• Follow “exposed” and “unexposed” groups
• Gold-standard study design
Analysis Methods
![Page 51: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/51.jpg)
Analysis Methods
Prospective cohort
Case Control Total
Exposed A B (A+B)
Unexposed C D (C+D)
Risk Ratio (RR) = Incidence of disease inExposed A/(A+B)
or Unexposed C/(C+D)
![Page 52: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/52.jpg)
Prospective Study:Interpretation of Risk Ratio
1.0 – Referent>1.0 – Risk for disease increases<1.0 – Risk for disease decreases
Confidence Intervals: probably contain true RR
*For rare diseases, OR ~ RR
Analysis Methods
![Page 53: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/53.jpg)
Case/control: Matching
Age Gender Race
Warning: Can “over match” andmiss describing an interesting factor
Bad Example: Cases: Adults with heart disease Controls: Newborns without heart disease
Analysis Methods
![Page 54: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/54.jpg)
Case/control: Stratifying
Age Gender Race
Warning: Need sufficient sample size to stratify or split the data into males and females
Ex. Cases with heart disease Aged-matched controls without heart disease (Exposure: smoking status)
Stratify for Gender Specific Risks
Analysis Methods
![Page 55: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/55.jpg)
Problems in Case/Control genetic association studies –
• “Confounding” by race or ancestry
• AKA population stratification
• Solutions:MatchStratifyAdjust (using genetic
markers)“Trios”
Cardon and Palmer (2003) Lancet 361:598-604
Analysis Methods
![Page 56: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/56.jpg)
• Given
– Height as “target” or “dependent” variable
– Sex as “explanatory” or “independent” variable
• Fit regression model
height = *sex +
Analysis Methods
Regression
![Page 57: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/57.jpg)
Analysis Methods
• Given
– Quantitative “target” or “dependent” variable y
– Quantitative or binary “explanatory” or “independent” variables xi
• Fit regression model
y = 1x1 + 2x2 + … + ixi +
Regression
![Page 58: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/58.jpg)
• Works best for normal y and x• Can include covariates• Fit regression model
y = 1x1 + 2x2 + … + ixi +
• Estimate errors on ’s• Use t-statistic to evaluate significance of ’s• Use F-statistic to evaluate model overall• Use R2 to evaluate variance explained by
model
Analysis Methods
Regression
![Page 59: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/59.jpg)
Analysis Methods
Coding Genotypes
000GG
011AG
121AA
RecessiveAdditiveDominantGenotype
Genotype can be re-coded in any number
of ways for regression analysis
![Page 60: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/60.jpg)
![Page 61: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/61.jpg)
![Page 62: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/62.jpg)
Example of gene-environmentInteraction and traditional
regression
![Page 63: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/63.jpg)
Analysis Methods
Statistical Packages for Genetic Association Studies
• Candidate gene association studySAS/GeneticsSTATASPSSRPLINK
• Whole genome association studyRPLINK
![Page 64: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/64.jpg)
Analysis Methods
Whole genome in PLINK(pngu.mgh.harvard.edu/~purcell/plink/)
MHC removed
Can adjust for population stratificationCan add covariates
P<1x10-100P<2x10-11
P<5x10-8Genome-widesignificance
P=5x10-8
Plenge et al 2007 NEJM
![Page 65: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/65.jpg)
SNPs versus Haplotypes
• There is no right answer: explore both
• The only thing that matters is the correlation between the assayed variable and the causal variable
• Sometimes the best assayed variable is a SNP, sometimes a haplotype
![Page 66: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/66.jpg)
SNPs versus Haplotypes
• Haplo.stats (haplotype regression)Lake et al, Hum Hered. 2003;55(1):56-65.
• PHASE (case/control haplotype)Stephens et al, Am J Hum Genet. 2005 Mar;76(3):449-62
• Haplo.view (case/control SNP analysis)Barrett et al, Bioinformatics. 2005 Jan 15;21(2):263-5.
• SNPHAP (haplotype regression?)Sham et al Behav Genet. 2004 Mar;34(2):207-14.
Statistical Packages for Genetic Association Studieswith haplotypes
![Page 67: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/67.jpg)
Analysis Methods
Multiple testing
• Bonferroni correctionToo conservative b/c each SNP tested
may not be independent (LD)How many independent tests did you do?See Conneely and Boehnke AJHG (in press)
• False Discovery RateAlso has arbitrary threshold
• Best bet is replication
![Page 68: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/68.jpg)
Statistical Replication
0
0.1
0.2
0.3
0.4
0.5
0.6
H2 H5 H6 H7 H8Change in ln(CRP) per copy relative to H2
Black
Mexican-American
White
Carlson et al. AJHG 2005;77:64-77
Results Consistent with CARDIA
CRP SNPs and CRP levels in NHANES III
Crawford et al Circulation 2006; 114:2458-2465
![Page 69: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/69.jpg)
• Statistical replication is not always possible
• Association may imply mechanism
• Test for mechanism at the bench– Is predicted effect in the right direction?– Dissect haplotype effects to define functional SNPs
Functional Replication
![Page 70: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/70.jpg)
Functional Replication
CRP Evolutionary Conservation
• TATA box: 1697• Transcript start: 1741• CRP Promoter region (bp 1444-1650) >75% conserved in mouse
![Page 71: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/71.jpg)
Functional Replication
Low CRP Levels Associated with H1-4
• USF1 (Upstream Stimulating Factor)– Polymorphism at 1440 alters USF1 binding site
1420 1430 1440 H1-4 gcagctacCACGTGcacccagatggcCACTCGtt H7-8 gcagctacCACGTGcacccagatggcCACTAGtt H5-6 gcagctacCACGTGcacccagatggcCACTTGtt
![Page 72: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/72.jpg)
High CRP Levels Associated with H6
• USF1 (Upstream Stimulating Factor)– Polymorphism at 1421 alters another USF1 binding site
1420 1430 1440 H1-4 gcagctacCACGTGcacccagatggcCACTCGtt H7-8 gcagctacCACGTGcacccagatggcCACTAGtt H5 gcagctacCACGTGcacccagatggcCACTTGtt H6 gcagctacCACATGcacccagatggcCACTTGtt
Functional Replication
![Page 73: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/73.jpg)
CRP Promoter Luciferase Assay
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
H1-3 H4 H5 H6 H7-8 empty SV40p
Fold change over H1-3
Carlson et al, AJHG v77 p64
Functional Replication
![Page 74: Day2 145pm Crawford](https://reader034.vdocuments.site/reader034/viewer/2022051111/55516558b4c905a8768b53c6/html5/thumbnails/74.jpg)
Association Analysis Outline
• Study Design• SNPs versus Haplotypes• Analysis Methods• Candidate Gene• Whole Genome Analysis• Replication and Function