part i: designs and theoretical issues ahmed rebai, phd [email protected] 1 genome wide...
TRANSCRIPT
![Page 1: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/1.jpg)
Part I: Designs and Theoretical issuesAhmed Rebai, Phd
1
Genome Wide Association Studies
![Page 2: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/2.jpg)
Screening the genome
2
Human inherited diseases (phenotypes) have a genetic basis that needs to be unraveled
Diseases range from Mendelian (single gene!) to complex (multiple genes, pathways, environment,..)
Look for DNA sequence changes (single base changes, duplication, deletions,..) that might explain the phenotype spectrum
![Page 3: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/3.jpg)
What is polymorphism?
3
Anything that differ between individuals, species,..
![Page 4: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/4.jpg)
Genetic markersA genetic marker is a gene or DNA sequence
with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation that can be observed.
A genetic marker is an easily identifiable piece of genetic material, usually DNA, that can be used in the laboratory to tell apart cells, individuals, populations, or species
A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change or a long one, like minisatellites.
4
![Page 5: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/5.jpg)
Polymorphic Sequences
5
![Page 6: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/6.jpg)
RFLP: variation in restriction sites
6
![Page 7: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/7.jpg)
Microsatellites (STR or SSR)
7
![Page 8: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/8.jpg)
Using genetic analyzer
8
![Page 9: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/9.jpg)
STRMultiallelic and very informativeUsed to construct the first linkage maps
and mapping diseases genes or quatitative trait loci
Used in forensics and individuals identification (criminology, paternity)
Used to infer population history and study diversity
9
![Page 10: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/10.jpg)
AFLP: Amplified Fragment Length Polymorphism
10
![Page 11: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/11.jpg)
CNV: Copy Number Variation
11
![Page 12: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/12.jpg)
SNPs: Single Nucleotide Polymorphism
12
![Page 13: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/13.jpg)
Classes of SNP
13
![Page 14: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/14.jpg)
Gene structure
14
![Page 15: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/15.jpg)
Location of SNP in gene regions
8,51,3
5,2 7,6
64,6
12,9
0
10
20
30
40
50
60
70
Coding 5'UTR 3'UTR Promoter Introns Other
15
![Page 16: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/16.jpg)
codingSNP effects
16
44,4
19,2
23,6
8,2
3,71
0
5
10
15
20
25
30
35
40
45
Synonym. Conservat. Moderate Interm. Radical Nonsens
![Page 17: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/17.jpg)
Design of association studies
17
Family-based: data consists in families (trios, nuclear, pedigrees,..) segregating for the phenotype
Population-based: two samples one of cases (one class of phenotype) the other of (matched) controls
![Page 18: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/18.jpg)
Effect of SNP1 SNP tous les 300pb (0.5% du génome)
18
Polymorphisme Proportion (%)
Synonyme
Non synonyme Conservatif
Modérément conservé
Modérément radical
Radical
Stop
44,4
19,2
23,6
8,2
3,7
0,9
![Page 19: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/19.jpg)
Designs and methods
19
![Page 20: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/20.jpg)
Trios: the most simple familyTwo parents one affected childParents serve as controls and we
look for overtransmission of some allele to affected children
20
![Page 21: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/21.jpg)
Sib-pair design
Affected sib-pairs: ASP both siblings affected
Discordant sib-pairs: one affected-one unaffected
21
![Page 22: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/22.jpg)
General family
22
![Page 23: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/23.jpg)
Case-controls vs Trios
23
![Page 24: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/24.jpg)
Family vs population
24
![Page 25: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/25.jpg)
AssociationAssociation is simply a statistical statement
about the co-occurrence of alleles or phenotypes.
Genotype AA or Allele A is associated with disease D if people who have D also have AA or A more (or maybe less) often than would be predicted from the individual frequencies of D and AA or A in the population.
25
![Page 26: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/26.jpg)
Three possible causes of associationbest: genotype or allele increases disease susceptibility – candidate gene studies
good: some subjects share common ancestor – linkage disequilibrium studies
bad: association due to population stratification – family-based offer protection
26
![Page 27: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/27.jpg)
Types of association studies
27
The candidate polymorphism approach: a SNP ‘suspected’ of being involved in the disease causation
Candidate gene approach: typing 5-50 SNP within a gene which is either a Positional candidate from a prior linkage studyFunctionnal candidate based on homology
with a gene of known function in a model species
Fine mapping: hundreds of SNP in a candidate region (1-10 Mb), containing 5-50 genes identified by a linkage genome scan.
The genomewide scan approach: >300,000 SNP distributed throughout the genome
![Page 28: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/28.jpg)
Candidate SNP
28
![Page 29: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/29.jpg)
Genome-wide SNPs
29
![Page 30: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/30.jpg)
GWAS
30
Searching for associated SNP in a candidate gene is like looking for a lost key in a dark street
Typing 10 million SNPs is too costly and laborious (billions of genotypes)
Searching for an optimal set of 300 to 500 thousands SNP for use in GWAS
![Page 31: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/31.jpg)
Multistage designs
31
![Page 32: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/32.jpg)
32
![Page 33: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/33.jpg)
Basic principle of AS
33
![Page 34: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/34.jpg)
GWAS data: so simple!
34
![Page 35: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/35.jpg)
Checking data: Testing before testing!
35
Preliminary analyses
![Page 36: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/36.jpg)
Hardy-Weinberg Equilibrium
36
If the population is:Panmictic: random matings and of large
sizeThere is no migration
And the locus:Is not subject to selection
Then genotype frequencies can be deduced from allele frequencies (p frequency of A):
AA: p² Aa: 2p(1-p) aa: (1-p)²
![Page 37: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/37.jpg)
HWE
37
Deviation from HWE can be due to inbreeding, population stratification, selection..
Test HWE in the control sample as data quality check: discard SNP that significantly departure from HWE at α=10-4
Ignore the case where departure can be due to tendancy to miscall heterozygotes as homozygotes in deletion polymorphisms that could be important in disease causation
![Page 38: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/38.jpg)
Tests of HWE
38
Compare observed to expected genotype counts using Pearson chi-square test of goodness of fit: with 3 genotypes and 1 parameter estimated (p) this is a test with 1 df
Inappropriate for rare variants (low genotype counts): use Fisher Exact Test (FET)
Other Exact tests are available in the R language (e.g. Genetics package,…)
![Page 39: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/39.jpg)
HWE tests for many SNP
39
A correction for multiple testing is needed (Bonferroni correction: p-value is multiplied by the number of SNP), using p<10-4
A Quantile-Quantile plot or QQ-plot of p-values for L SNPs:
sort p-values by decreasing orderplot the –log(ith p-value) against -log(i/(L+1))
SNP that deviate from the diagonal line are not in HWE
![Page 40: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/40.jpg)
QQ-plot for HWE
40
![Page 41: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/41.jpg)
A single SNP
41
Tests of association
![Page 42: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/42.jpg)
Pearson chi-square Test
42
If we construct the table of genotype counts in cases and controls
Use the chi-square test (2 df) or FETIn complex traits (roughly additive mode of
action) the chi-square test is not good. To improve power we can use other tests (allelic or Armitage).
Good if frequent alleles
Genotype AA Aa aa Cases P1 Q1 R1 Controls P0 Q0 R0
![Page 43: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/43.jpg)
Risk models
43
There are four possible risk models for any given SNP depending on relative risk;
Take genotype aa as a reference genotype with risk equal to 1 then:
Genotype AA Aa aa Additive 2 1 Dominant 1 Recessive 1 1 Multiplicative ² 1
![Page 44: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/44.jpg)
Armitage trends test
44
AA Aa aa Sum
Cases N11 N12 N13 R1
Controls
N21 N22 N23 R2
Sum C1 C2 C3 N
![Page 45: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/45.jpg)
Armitage testBy choosing weights ti this test can
manage all types of modes of inheritanceFor dominant (1,1,0) and (0,1,1) for
recessiveFor additive (0,1,2) are usedIts distribution as a chi-square is correct
even if we do not have HWEThe same test as in logistic regression Most powerful test for additive model Recommended for rare alleles
45
![Page 46: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/46.jpg)
Graphical Armitage test
46
![Page 47: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/47.jpg)
Allelic test
47
Define the allele count table from genotypes
Chi-square test with 1 dfNot recommended because it requires HWE in cases and controls combined and risk estimates are not interpretable
Allele A a Cases 2P1+Q1 2R1+Q1 Controls 2P0+Q0 2R0+Q0
![Page 48: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/48.jpg)
Allelic test
48
Allele A a
2Nb+da+c
c+ddc
a+bba+
-
Disease S
tatus
))()()((
)²(²
dbcadcba
bcadN
p-value = Prob(²1df> ²obs )
![Page 49: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/49.jpg)
Improved allelic tests
49
Nuel et al (2006) proposed an exact allelic test that is not biased by departure from HWE (implemented in R).
Song and Elston (2006) proposed a correction for allelic trend test when HWE does not hold.
The Cochrane-Armitage test is a conservative allelic test not relying on HWE: fit a horizontal line to proportion of cases in the three genotypic classes
![Page 50: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/50.jpg)
Logistic regression
50
Let us denote by i the disease risk for individual i (i =Prob(yi=1)), the model consists in stating that
Logit()=log(/(1- ))=0 for aa 1 for Aa2 for AA
To test association we test: 0=1=2
If we set : 1=(0+2)/2 we get an additive model1=0
we get recessive model1=2
we get dominant model
![Page 51: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/51.jpg)
Logistic regression
51
The advantages are that:Many SNPs can be included in the same model, allowing test for epistasis and gene by environment interaction
SNP effect can be tested while adjusting for covariates such as age of onset, gender, …
![Page 52: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/52.jpg)
Which test to use?
52
There is no generally accepted answer!FET spread over the range of risk models
but less powerful to detect near-additive risks.
Armitage: good for additive models, weak power for other models
The problem is that the model is unknownTake the Max of test statistics over modelsArmitage for rare variants, FET elsewhereBayesian Testing
![Page 53: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/53.jpg)
Bayesian testing: a different way of thinking
53
Instead of computing a p-value (probability of having the test value by chance) we compute a Posterior Probability of Association (PPA):Choose a value of the prior probability of
association (10-4 to 10-6)Compute the Bayes Factor for each SNP BF=Pr(Data/Association)/Pr(Data/no
association)Calculate the Posterior Odd and then PPA
PO
POPPAandBFPO
11
![Page 54: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/54.jpg)
Example
54
![Page 55: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/55.jpg)
Advantages of Bayesian
55
Allows averaging over genetic models by computing a combined BF between models
Allows Averaging over effect sizes: SNP with higher to low risk
Allows incorporating external biological information: SNP near genes, with known biological function, with low frequency, conserved among species,.. can be given higher
![Page 56: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/56.jpg)
Measurig Risk
56
A measure of risk is the odds ratio:
If OR=1, no association
If CI contains 1, no significant association (at 5%)
2Nb+da+c
c+ddc
a+bba+
-D
isease
A a
bc
ad
db
ca
dbb
caaOR
/
/
)/(
)/(
dcbabc
adCI
111196,1exp%95
![Page 57: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/57.jpg)
Genotypic RisksAllelic risks do not make much sense
because it is not forward to translate them into individual risk
We can define ORhom: between the two homozygous genotypes and ORhet between heterozygous compared to homozygous
If HWE holds in both cases and controls we can show that ORhet=OR and ORhom=ORhet²
57
![Page 58: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/58.jpg)
Population attributable risk
58
Represents the excess risk of disease in those having the risk allele with those not having it
K is the prevalence of carriers in the population
Can be approximated, for a rare dominant risk allele by
1)1(
)1(
ORK
ORKPAR
)21(
)1(21
pP
PpPPAR
aa
aaAa
![Page 59: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/59.jpg)
Categorial Phenotypes
59
Categorial trais can be:Unordered: disease subtypes and association
can be tested by multinomial regression Ordered such as disease severity (mild,
moderate, severe) and we need a method that gives more weight to the most severiliy affected cases (diagnosis is more certain, causal genes contribute more)
If we assume that the risk for category k relative to category (k-1) is the same for all k, then we can build a score test (generalization of Armitage test)
![Page 60: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/60.jpg)
Continuous phenotype
60
We use mean comparison (analysis of variance) or linear regression between the three genotypes
Both require the trait to be Normally distributed for each genotype class and have the same variance;
If not a transformation of the trait might be necessary (log, inverse, square root, box-cox)
![Page 61: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/61.jpg)
Linear regression
61
![Page 62: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/62.jpg)
Association in trios: the TDT
Non-transmitted allele
Transmitted allele
M1 M2 Total
M1 a b a + b
M2 c d c + d
Total a + c b + d 2n
62
![Page 63: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/63.jpg)
Complicating factors!
63
Population stratification can generate spurious genotype-phenotype association
![Page 64: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/64.jpg)
Genomic Controls
64
We consider a set of about 100 « null » SNPs (that are mostly not related to the disease)
The Armitage test is computed for each null SNP
Compute , the median of test values divided by its expectation
If >1 (which is indicative of stratification), then divide test value by
Caveats: Limited in applicability, conservative, problem in choosing null SNPs
![Page 65: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/65.jpg)
Structured association methods
65
Searches for the best sub-population structuring by optimizing some criteria
Allocate individuals to hypothetical sub-populations
Test for association conditional on this allocation
Caveats: Computationally demanding, Subpopulations are theoretical constructs and have no direct interpretation
![Page 66: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/66.jpg)
Other methods
66
Include null SNP as covariates in regression analyses: computationally fast, more flexible than GC but it is recommended to assess type-I error by simulation.
Use Principal Component Analysis to diagnose population structure using null SNPs
Mixed-model approaches that estimates kinship (relatedness between individuals)
![Page 67: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/67.jpg)
Kinship between individuals
67
Exclude theseindividuals
![Page 68: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/68.jpg)
Power and sample size
68
In statistical testing we consider: a null hypothesis H0: « no association » versus an alternative hypothesis H1: « association »
This results in two types of errorThe first (type-I, ) is fixed (chosen) and The second (type-II, ) can be calculated for given values of disease variant parameters (risk and allele frequency), a given risk model and a given sample size.
![Page 69: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/69.jpg)
Errors in statistical testing
69
H0 True no association
H0 false association
Accept H0
Declare absence of association
1-
Confidance level
(type II error)
Reject H0 Declare
association
(type I error): 5%
1-
Power
Truth: unknown
Decision
![Page 70: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/70.jpg)
How to compute power?
70
Power=Pr(Declaring association/there is actually association)
If we have the theoretical distribution of the test statistic then
Theoretical power can be computed by analytical approximate formula
)mod,,,/Pr( ²,1
²1 elnpPower dfdf
![Page 71: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/71.jpg)
Empirical power
71
Power of the sample under study per se can be computed using resampling technique such as bootstrap or permutation
Bootstrap: create M new samples by allocating for each individual a genotype by random selection from the original genotypes array (with replacement)
Permutation: create M new samples by sufflling the individuals
Compute test statistic for each sampleEstimate power as the proportion of samples in
which association is declared (test value is greater than the predefined threshold at a given )
![Page 72: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/72.jpg)
Permutation
72
![Page 73: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/73.jpg)
Bootstrap
73
2112
2210
2012
2100
![Page 74: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/74.jpg)
Bootstrap
74
![Page 75: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/75.jpg)
Power and gene risk
75
0
500
1000
1500
2000
2500
1,5 2 2,5 3 3,5 4
Genotype Relative Risk
Sam
ple
siz
e N
M: p=0.1
A: p=0.1
M: p=0.5
A: p=0.5
![Page 76: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/76.jpg)
Power and allele frequency
76
0
50
100
150
200
250
300
350
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Risk allele frequency (p)
Sam
ple
siz
e N
M
A
![Page 77: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/77.jpg)
77
![Page 78: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/78.jpg)
Heavy statistics
Advanced analyses in GWAS..
78
![Page 79: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/79.jpg)
Missing genotype data
79
A problem for multipoint SNP analysesData imputation: replace missing genotypes
with predicted onesPredicted genotypes: that best fits with
genotyps at neihbouring SNP using:Best prediction based on some statistical
criteria (e.g. maximum likelihood)Randomly selected from a probability
distribution (resampling methods)Hot-deck: replace with that of an individual
whose genotype matches at neighboring SNPRegression models using genotyes of all
individuals
![Page 80: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/80.jpg)
Missing genotypes
80
All these approches assume that data are missing at random (independently from the genotype) which is often doubtful due to:Bad matching of cases and controlsHeterozygotes are genotyped as
homozygotesDifferential rate of missingness can be
checked by testing association between missing status and disease status (code 0 for missing and 1 for non-missing)
![Page 81: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/81.jpg)
Haplotypes from genotypes
81
If interesed in many tightly linked SNP it is very useful to use haplotypes
A haplotype is a set for alleles carried by one chromosome (phased)
Haplotype of an individual can be:Determined by Laboratory-based methodsInfered from family memebrsEstimated using statistical methods (need
genotypes of unrelated inidviduals)True haplotypes are more informative than
genotypes but inferred are less (unless LD is high)
![Page 82: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/82.jpg)
Haplotypes
82
![Page 83: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/83.jpg)
83
Pattern of LD!LD organized in block of variable size
Ex: a risk haplotype for Crohn disease extends over 250 kb
LD very sensitive to population history, structure and demographic events : less than expcted for small distance (<10 kb) and more than expected for large distance! Average in African 5 kb, in Europeans; 60 kb.
Very hetergenous (non uniform) in the genome:
Genetic Isolates : useful for LD blocks extending over 200 kb et autour des régions impliqués dans les maladies communes
![Page 84: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/84.jpg)
84
Pattern of LD in the genome
![Page 85: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/85.jpg)
LD and distance
85
![Page 86: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/86.jpg)
LD generated by a new mutation
86
![Page 87: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/87.jpg)
LD Measures
87
D’ can be large (indicate high LD) even when one allele is very rare, which is of little practical interest
Nr² is the chi-square test in 2x2 table of haplotype counts
r² is directly related to statistical power: if disease risk is multiplicative and HWE holds then r² beween a SNP and a causal variant is the sample size required to detect association by directly typing the causal variant, relative to that required to achieve the same power when typing the SNP.
![Page 88: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/88.jpg)
In other words
88
If you have a SNP having an r²=0.10 with a causal variant and if
you need a sample of 100 individuals to detect association with the causal variant with 80% power
Then you need 100/0.1=1000 individuals to detect association (with 80% power) with the SNP.
![Page 89: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/89.jpg)
SNP tagging
89
Select a minimal numbers of SNP that retain as much as possible of the genetic variation of the full SNP set
![Page 90: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/90.jpg)
LD blocks and TagSNP
90
SNP
LD BLOCK1
LD BLOCK 2 LD BLOCK3
tSNP
![Page 91: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/91.jpg)
Methods for SNP Tagging
91
Simple: for each pair of neighbor SNP discard the one (having the most missing data) if r²>0,9
Sophisticated: find the smallest number of SNPs that need to be genotyped to cover the other SNPs at an r² ≥ 0.8
Regression methodsLinear Dynamic programming
![Page 92: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/92.jpg)
Usefulness of tagging
92
The HapMap projectTransferability: a tag SNP selected in one population might not perform well in another but in general it is good
Use only tagSNP for analysis even if all have been genotyped.
Some SNPs are not captured !
![Page 93: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/93.jpg)
Missed SNPs
93
![Page 94: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/94.jpg)
HapMap Project
94
The goal was to determine the common patterns of DNA sequence variation in the human genome (a Haplotype Map) by characterizing :Sequence variantsTheir frequenciesCorrelation between them
From population with african, asian and european ancestry
![Page 95: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/95.jpg)
Hapmap phases
95
The phase I was to genotype one SNP every 5 kb in 270 individuals from 4 geographic regions :30 individuals from the Yuruba (Nigeria)30 from the CEPH project in Utah45 Han chinese45 Japenene from Tokyo
Phase II: typing 4 million SNPs in the same samples (completed in 2005)
Phase III: other population samples (open)
![Page 96: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/96.jpg)
Visit: www.hapmap.org
96
![Page 97: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/97.jpg)
The multiple SNP scenario
97
Testing association
![Page 98: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/98.jpg)
Unphased genotypes: Logistic regression
98
A model including all SNPs as well as covariates, interaction effects,…
A score test with 2L df (L df if we assume additivity)
Use only tagging SNP to eliminate redundancy and increase power
Use stepwise selection procedure to avoid highly correlated SNPs
Assessing significance is problematic!
![Page 99: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/99.jpg)
Combining single locus tests
99
Use cumulative sums of single locus tests and identify those that are of particular interest
Detecting local high-scoring segments, groups of neighbor SNPs that have small association p-values by methods and algorithms similar to those used in finding sequence patterns.
![Page 100: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/100.jpg)
Haplotype-based methods
100
Reduce the number of df in modelsCapture correlation strucure of SNP in LD blocks
Capture combined effect of highly linked cis-acting causal variants
Caveats: haplotypes are not observed but inferred and it is hard to account for the uncertainty of their inference
![Page 101: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/101.jpg)
Haplotype tests
101
Use a 2xk contingency table (problem of zero cells for rare haplotypes) or Compare frequencies of haplotypes (rely on HWE and near-additive risk)
Haplotypes are treated as categorial variables in regression analyses
Compare patterns of LD between cases and controls (Zaykin et al, 2006)
![Page 102: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/102.jpg)
Contrasting LD patterns
102
![Page 103: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/103.jpg)
Problems
103
Rare haplotypes: including them results in loss of power if haplotypes are similar but correspond to distinct causal variants
Solution: Combine rare haplotypes in controls into a single category
LD block vary with sample size, SNP density and block definition
Use clustering to identify sets of haplotypes sharing common ancestry
![Page 104: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/104.jpg)
Three major complicating factors
104
Missing dataEpistasisGene-environment interaction
![Page 105: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/105.jpg)
Missing Genotype imputation
105
Seen before!
![Page 106: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/106.jpg)
Epistasis
106
A variant with a small marginal effect of individual SNPs might turn to have a strong effect in certain genetic background and be of clinical significance
Is it better to tackle epistasis directly or first focus on marginal effects?
The inclusion of epistasis is very easy in regression methods but testing all combinations is unwise: should be limited to genes with no marginal effects
![Page 107: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/107.jpg)
Gene-environment interaction
107
The risk conferred by alleles or genotypes is not the same across environments
Environment often has a very « loose » definition: nutrition, lifestyle, exposition to ‘pollution’ (smoking, solvants,..)?
Test for association in different samples defined according to their environment?
![Page 108: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/108.jpg)
Higher order interactions?
108
![Page 109: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/109.jpg)
The mutiple testing problem
109
Particularly acute when testing thousands of SNP but also relevant in single SNP analysis
From a frequentist perspective, If we fix the overal type-I error rate at =5%.
If we want all tests should generate together less than false positives and
If we have L SNP, If SNP are considered independant (not true!)
we should use a per-SNP significance level of ’ such that:
![Page 110: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/110.jpg)
Multiple testing
110
Known as the Bonferroni correction
For L=1 million we have ’=5 10-8 This is conservative because many SNP are tightly linked (high LD)
Many other procedures for controling type-I error exist
LsoL ')'1(1
![Page 111: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/111.jpg)
Another Bonferroni!
111
Use Bonferroni with a corrected n, the number of effective SNPs
Can be done easily with R langage
![Page 112: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/112.jpg)
Multiple testing: permutation
112
Compute p-values using permutation:Randomize phenotype labels over individuals
while retaining genotypes (the LD structure is conserved but the association with phenotype is broken)
Repeat this many times and analyse all the datasets
Obtain p-value for each dataset and each SNP as the proportion of test values that are greater
than the observed intial test (with original data)Easily implemented in R langage Computationnaly demanding (for 1 SNP, a
sample of 200:200 and on a PC, 10,000 permutations take 2’’ so 1 million SNP this gives 4 years!)
![Page 113: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/113.jpg)
The importance of replication
113
Use an independant sample (preferably genotyped in a different platform) to confirm an association reported in an initial study
To not counfound with cross-validation: splitting a sample in two subsets one used to search for association and the other to check the initial findings
![Page 114: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/114.jpg)
Conclusion: The future
114
To complex disease, complex analysesWe still need powerful statistical
methods that analyze many variants simultaneously for their individual effects and joint contribution to disease risk
Some issues, such as stratification, will be banished with relatedness methods
Bayesian methods and graphical bayesian models are becoming very attractive for GWAS data analysis
![Page 115: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/115.jpg)
115
![Page 116: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/116.jpg)
Recommended readings
116
Nature Reviews Genetics Balding J, 2006. 7: 781-791Wang et al, 2005. 6: 109-117Stephens and Balding,2009, 10: 681-690
![Page 117: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/117.jpg)
117
![Page 118: Part I: Designs and Theoretical issues Ahmed Rebai, Phd ahmed.rebai@cbs.rnrt.tn 1 Genome Wide Association Studies](https://reader036.vdocuments.site/reader036/viewer/2022081520/56649ef05503460f94c005b6/html5/thumbnails/118.jpg)
118