haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구
DESCRIPTION
Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구. 남정모 1) , 김진흠 2) , 강대룡 1) , 신선미 3) , 이윤경 3) , 박정용 3) , 허남욱 3) , 서일 1). 1) 연세대학교 의과대학 예방의학교실 2) 수원대학교 통계정보학과 3) 연세대학교 대학원 보건학과. Background. Linkage Association - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/1.jpg)
Haplotype 에 기초한 genetic linkage/association 의 통계학적 알고리즘 연구
1) 연세대학교 의과대학 예방의학교실2) 수원대학교 통계정보학과3) 연세대학교 대학원 보건학과
남정모 1), 김진흠 2), 강대룡 1), 신선미 3), 이윤경 3), 박정용 3), 허남욱 3), 서일 1)
![Page 2: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/2.jpg)
Background
Linkage Association Recombinant fraction Allelic disequilibrium
Family Population
- Trio (TDT) - Case-Control
- (Affected) Sib-Pair
Single locus
Tightly linked multi-locus (Haplotype)
![Page 3: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/3.jpg)
Background
Why haplotype-based? But …
• Haplotype set of closely linked genetic markers present on one chromosome which tends to be inherited together
• Many markers are genotyped within a very short physical distance
• More informative
• Haplotype information is not usually available from genotype information (Haplotype reconstruction)
eg. When # of heterozygous loci=c, # of possible haplotype pairs=2c-1
![Page 4: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/4.jpg)
Background (Haplotype ambiguity)
• Hypothetical family trio
• Probable haplotype pairs
Sample Father (F) Mother (M) Child (c)
1 12 / 12 / 11 22 / 12 / 22 12 / 12 / 12
2 12 / 22 / 11 22 / 12 / 22 12 / 22 / 12
Sample Father (F) Mother (M) Child (c)
1 {111, 221} {222, 212} {111, 222}
or {121, 211} {212, 222} {121, 212 }
2 {121, 221} {212, 222} {121, 212}
• In Sample 1, F’s haplotype uncertainty exists !
• In Sample 2, haplotypes of parents are deducible !
![Page 5: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/5.jpg)
Background
Previous researches
• Wilson (1997, AHG), Clayton & Jones (1999, AJHG) :
Discard families with ambiguous haplotypes
• Clayton (1999, AJHG) :
likelihood-based but not robust to population admixture
• Zhao et al. (2000, AJHG) :
How to resolve haplotype ambiguity ?
Allocate a conditional probability to each of haplotype group corresponding to a set of genotypes
![Page 6: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/6.jpg)
Haplotype-based TDT (Zhao et al., 2000)
• For each g, estimate the number of families :
F with {Hi , Hj} transmits Hi and M with {Hk , Hl} transmits Hk
, ˆ i j k lik jlg g s s s s
i j k lg
h h h ht n
h h h h
• Transmission / Non-transmission table
, ,
g
ˆ ˆ ˆ k l i jg g
k l i j
t t t
ˆ ˆ( )T t
![Page 7: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/7.jpg)
• Based on Spielman & Ewens(1996)’s multi-allelic TDT
2
&1
ˆ ˆ( )-1ˆ ˆ ˆ2
h
s e
t thT
h t t t
• Remark
Ts&e follows the chi-square distribution with df = h-1 asy
mptotically ? No except h=2
Why ? Sham(1997,AJHG) & Lazzeroni and Lange(1998, Hum Hered) + dependency between cell counts
⇒ obtain empirical p-value by randomization process
Haplotype-based TDT (Zhao et al., 2000)
![Page 8: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/8.jpg)
Purpose
• Propose new haplotype-based linkage / association test
• Investigate empirical levels and powers of proposed test statistics by simulation
• Test linkage / association between 4 SNPs (A-240T, T-93C, I/D, G2350A ) in ACE (Angiotensin-I Converting Enzyme) gene and hypertension
![Page 9: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/9.jpg)
Proposed haplotype-based Score and LR test
• Stuart (1955)
1
1 1 1 1
,
ˆ ˆ ˆ ˆ ( , , )
ˆ ˆ ˆ2 , ( )
ˆ ˆ( ),
h
i
s
h
i iiij
ij ji
T
t t t t
t t t i j
t t i j
• Bradley & Terry (1952)
& 1 02(log log )
through a logistic model such as log( / )b t
ij ji i j
T L L
p p b b
![Page 10: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/10.jpg)
Fig.1 Genealogy of four mutations at four loci used in simulation study
H1 H2 H3 H4 H5 H6 H7 H8
D → d
A3 → a3
A3 → a3
A3 → a3
A3 → a3
A2 → a2
A2 → a2
A1 → a1
(A1, A2, D, A3)
Simulation studies
# of loci = 3
![Page 11: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/11.jpg)
Simulation studies• Types of haplotype frequencies
Type Pop. Frequencies of (H1 , H2 , H3 , H4 , H5 , H6 , H7 , H8)
1 1 (0.343, 0.147, 0.147, 0.063, 0.147, 0.063, 0.063, 0.027)
2 (0.490, 0.000, 0.210, 0.000, 0.210, 0.000, 0.090, 0.000)
2 1 (0.343, 0.147, 0.147, 0.063, 0.147, 0.063, 0.063, 0.027)
2 (0.343, 0.147, 0.147, 0.063, 0.147, 0.063, 0.063, 0.027)
3 1 (0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125)
2 (0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125)
• Haplotype with disease susceptible mutant allele = H7 , H8
• Background risk of disease occurrence = 0.1, 0.2
• RR=1(level); 1.2, 1.6, 2.0, 4.0, 6.0(power)
• # of subjects in each population = 200
• Generate genotype data for Case-Control(1 : 2) and Trio
• # of replication = 200 ; # of resampling = 100
![Page 12: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/12.jpg)
Configuration of genetic data generating
ConfigurationHaplotype frequ
ency
Background risk
Pop1 Pop2
I Pop1 ≠ Pop2 0.1 0.1
II Pop1 ≠ Pop2 0.2 0.1
III Pop1 = Pop2 0.1 0.1
IV Pop1 = Pop2 0.2 0.1
V Pop1 = Pop2 0.1 0.1
VI Pop1 = Pop2 0.2 0.1
![Page 13: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/13.jpg)
Empirical levels (5%)
ApproachSingle locus Haplotype
Conf. Locus1 Locus2 Locus3 LR § (Ts & e) Ts
Population-based
I 0.040 0.045 0.030 0.070
II 0.030 0.050 0.270 0.170
III 0.065 0.040 0.025 0.045 NA
IV 0.055 0.045 0.060 0.080
V 0.055 0.055 0.070 0.115
VI 0.055 0.070 0.050 0.080
Family-based
I 0.030 0.040 0.035 0.055 0.060
II 0.015 0.035 0.045 0.045 0.050
III 0.015 0.025 0.015 0.050 0.050
IV 0.055 0.010 0.045 0.075 0.060
V 0.035 0.055 0.070 0.080 0.075
VI 0.045 0.050 0.040 0.035 0.040§ Zhao, Curtis and Sham(2000, Hum Hered)’s x2 test
![Page 14: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/14.jpg)
Empirical powers for Conf. I
ApproachSingle locus Haplotype
RR Locus1 Locus2 Locus3 LR § (Ts & e) Ts
Population-based
1.2 0.060 0.070 0.055 0.085
1.6 0.080 0.075 0.045 0.140
2.0 0.155 0.125 0.035 0.220 NA
3.0 0.370 0.340 0.065 0.555
4.0 0.665 0.640 0.050 0.820
6.0 0.975 0.975 0.035 0.995
Family-based
1.2 0.030 0.050 0.020 0.055 0.065
1.6 0.050 0.040 0.010 0.100 0.100
2.0 0.090 0.115 0.025 0.115 0.120
3.0 0.230 0.195 0.040 0.315 0.300
4.0 0.415 0.405 0.040 0.615 0.595
6.0 0.760 0.705 0.025 0.960 0.940
![Page 15: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/15.jpg)
Empirical powers for Conf. V
ApproachSingle locus Haplotype
RR Locus1 Locus2 Locus3 LR § (Ts & e) Ts
Population-based
1.2 0.040 0.065 0.030 0.090
1.6 0.115 0.105 0.035 0.185
2.0 0.225 0.245 0.090 0.300 NA
3.0 0.645 0.675 0.055 0.760
4.0 0.940 0.940 0.030 0.990
6.0 1.000 1.000 0.030 1.000
Family-based
1.2 0.035 0.060 0.030 0.050 0.060
1.6 0.070 0.065 0.025 0.110 0.095
2.0 0.110 0.130 0.045 0.170 0.155
3.0 0.340 0.365 0.040 0.485 0.495
4.0 0.630 0.595 0.045 0.810 0.795
6.0 0.920 0.930 0.050 0.985 0.980
![Page 16: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/16.jpg)
Kangwha study
• 783 students who were aged of 15 at Kangwha in 1995 were monitored up to 1997 every year
• Phenotype : High BP
• Case : students experienced at least once SBP>130mmHg or DBP>85mmHg
Control : selected from the student having the lowest BP sequentially
Case : Control = 101 : 176
• Trio : students whose parents’ genotypes are available among students in Case group, 40 trios
• 4 SNPs : A-240T, T-93C, I/D, G2350A of ACE in region 17q23
![Page 17: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/17.jpg)
,
Schematic diagram of the human ACE gene illustrating the location of 10 biallelic polymorphisms.
Ploymorphisms are numbered in base pairs relative to the start of transcription of the ACE gene.
Exons 1-26 are indicated with vertical bars and are numbered intermittently for clarity.
Keavney et al., Human Molecular Genetics 1998
![Page 18: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/18.jpg)
Estimated haplotype frequencies
† Denote haplotypes corresponding to pair of (I/D, A-240T, T-93C, G2350A)
‡ EH, SAS, Proposed algorithm
Control Case
Bayesian EM ‡ Bayesian EM
IATG † 0.003 0.003 - -
IATA - - - -
IACG 0.017 0.016 0.010 0.010
IACA 0.594 0.595 0.604 0.604
ITTG - 0.002 - -
ITTA 0.034 0.033 0.015 0.015
ITCG - - 0.005 0.005
ITCA - - 0.010 0.010
DATG - - 0.005 0.005
DATA - - - -
DACG 0.011 0.012 - -
DACA 0.006 0.006 - -
DTTG 0.298 0.297 0.337 0.336
DTTA 0.026 0.026 0.010 0.010
DTCG 0.011 0.011 0.005 0.005
DTCA - - - -
![Page 19: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/19.jpg)
P-value of association and linkage test
† Normal approximation and Yate’s continuity correction
‡ Zhao et al. (2000, AJHG)’s and Score tests
§ Zhao, Curtis and Sham (2000, Hum Hered)’s χ2 test
Method Case-Control Trio
Single locus
I/D 0.988 0.217 / 0.280†
A-240T 0.899 0.132 / 0.175
T-93C 0.852 0.140 / 0.185
G2350A 0.828 0.170 / 0.223
Haplotype 0.184 § 0.152 / 0.095 ‡
![Page 20: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/20.jpg)
• Investigate single locus-based and haplotype-based association/ linkage test
- single locus-based < haplotype-based
- population-based <? > family-based
• Hypertension is NOT linked with the markers on ACE gene , but…
• In the future …
- how to reduce the haplotype uncertainty
- how to include observations with only one parent or with only sibship
- how to combine all types of observations into one test statistic
Summary
![Page 21: Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813fe7550346895daad49a/html5/thumbnails/21.jpg)
Thank you