computational problems in haplotype...
TRANSCRIPT
![Page 1: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/1.jpg)
Computational Problems in Computational Problems in Haplotype RecognitionHaplotype Recognition
by
Ali Katanforoush
Under supervision of
Dr Hamid Pezeshk and Dr Mehdi Sadeghi
A thesis submitted to the Graduate Studies Office ofUniversity of Tehran
In partial fulfillment of the requirements for the degree ofDoctor of Philosophy in BioinformaticsDoctor of Philosophy in Bioinformatics
Institute of Biochemistry and Biophysics November 1, 2009
![Page 2: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/2.jpg)
Outlines
● Haplotype basis and terminology● Haplotype inference● Haplotype block partitioning● Assessment of haplotype blocks
– Common haplotype coverage and tagSNP coverage
– Robustness of partitioning method
– Application to recombination hotspot detection
– Application to disease association studies
![Page 3: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/3.jpg)
Single Nucleotide Polymorphism; SNP
● A genetic variation in a single nucleotide that is sometimes observed among population; not too rare.
● SNPs are usually biallelic.
0
0
0
1
1
1
0
0
0
1
1
0
1
0
0
0
1
0
1
0
0
1
1
1
0
AGGACTAGATAATAGACCG
AGGACCACATTATAGTCCG
AGGACCAGATAATAGTCCG
ATGACCACATTATAGTCCG
ATGACTACATAATAGACCG
![Page 4: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/4.jpg)
Single Nucleotide Polymorphism; SNP
● SNP is the result of a substantiated single site mutation in population.
![Page 5: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/5.jpg)
Single Nucleotide Polymorphism; SNP
● SNPs are the most common form of genetic polymorphism in genomes.
● Each new cell contains ~3 new mutations.● Each new “child” ~20 new mutations.● Currently more then 4.3 million SNPs have been reported
to dbSNP; (0.1% of whole genome).
--A--------C--------A----G--------T---C---A------T--------G--------A----G--------C---C---A------A--------G--------G----G--------C---C---A------A--------C--------A----G--------T---C---A------T--------C--------A----G--------T---C---A------T--------C--------A----T--------T---A---A----
![Page 6: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/6.jpg)
Haplotype Map of the Human Genome
● Define patterns of genetic variation across human genome.
● Guide selection of SNPs efficiently to “tag” common variants.
● Public release of all data (assays, genotypes).
![Page 7: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/7.jpg)
Haplotype Map of the Human Genome
● Phase I: 1.3 M SNPs in 269 people.● Phase II: +2.8 M SNPs in 270 people;
– 30 parentparentoffspring trios from Nigeria (YRI)
– 30 trios of European descent from Utah (CEU)
– 45 unrelated individuals from Beijing (CHB)
– 45 unrelated individuals from Tokyo (JPT)
● Phase III: 1.3 M SNPs in 1184 people (10 panels).
![Page 8: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/8.jpg)
The first problem; Genotype Phasing
● Every genotype can be considered as sum of two unknown unknown haplotypes.
Real haplotypes
GenotypingGenotypes
![Page 9: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/9.jpg)
The first problem; Genotype Phasing
● Given a set of genotype samples of unrelated individuals, determine pairs of haplotypes adding up into given genotypes.
Real haplotypes
GenotypingGenotypes
Computationalphasing
Inferred haplotypes
![Page 10: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/10.jpg)
Haplotype inference by maximum parsimony
● Inferring the set of haplotypes consistent to genotype data requiring to some biological considerations.
● Maximum parsimonyMaximum parsimony is one of the most common models in biology.
● Other models; Perfect phylogeny, Maximum likelihood, Bayesian model.
0 1 1 0 00 1 1 0 0
0 0 0 1 10 0 0 1 1
0 0 1 0 10 0 1 0 1
1 0 0 1 11 0 0 1 1
1 1 0 0 01 1 0 0 0
00 11 11 11 11
00 11 2 0 2 0 11
11 0 0 11 11 11
22 11 00 11 11
11 22 11 0 00 0
![Page 11: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/11.jpg)
Haplotype inference by maximum parsimony
● Clark's algorithm (1990); greedy algorithm
● Finding a parsimony solution to haplotype phase is NP-hard, Hubbell (2002), Pinotti et al (2004)
● 0/1 linear programming, Gusfield (2003)
● Branch-and-Bound, Wang (2003)
![Page 12: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/12.jpg)
Methods on other approaches to haplotype inference
● Perfect phylogeny, Gusfield (2002), Filkov and Gusfield and Ding (2006)
● Inference of haplotype frequencies by maximum likelihood
– Expectation-Maximization, EM Slatkin and Excoffier (1995)
– Partition-Ligation, PL-EMQin et al (2002)
● Bayesian model, Smith and Donnelly and Stephens
(2001); PHASE
![Page 13: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/13.jpg)
Genetic Algorithm; GA
min f (x)s.t. P(x)=true
● Consider N feasible solutions; each one is represented by a bit string called “chromosome”.
● Select “chromosomes” of highest fitness fitness to produce a new generation.
● Crossover Crossover random pairs of selected “chromosomes” and mutatemutate some bits on other “chromosomes”.
● The optimal solution should be obtained by long repeats.
![Page 14: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/14.jpg)
Genetic Algorithm for haplotype inference with maximum parsimony
● Given n genotypes on l SNPs; g1, g
2, ..., g
n
find min |H|s.t.
● Braaten et al. (2000). The GA applied to haplotype data at the LDL receptor locus.
● Tapadar et al. (2000). Haplotyping in pedigrees via GA.● Azuma et al. (2009). Haplotype frequency estimation by GA.
∃ha , hb∈H : gi=ha⊕hb , for i=1,2, ,n
![Page 15: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/15.jpg)
A naive Genetic Algorithm for MP haplotyping
● “Chromosome” representation
![Page 16: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/16.jpg)
A naive Genetic Algorithm for MP haplotyping
● “Crossingover on chromosomes”
![Page 17: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/17.jpg)
A naive Genetic Algorithm for MP haplotyping
● “Mutation on chromosomes”
![Page 18: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/18.jpg)
A parametric greedy phasing aimed to MP
● Input: n genotypes on l SNPs,● Algorithm parameters:
– a permutation of {1,2,...,n}, σ=<σ1,σ
2,...,σ
n>
– a set of “guide haplotypes” {ħ1,ħ
2, ..., ħ
n} where
ħi~g
i
● In a greedy manner, it tries to resolve g(σi) with one of
haplotypes resolving g(σ1), g(σ
2), ..., g(σ
i1), but if it fails
then applies ħi.
![Page 19: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/19.jpg)
GAhapThe Genetic Algorithm for MP phasing
● Each “chromosome” ↔ an instances of greedy phasing algorithm
● Various permutations and “guide haplotypes” are encoded by bitstrings.
● Naive procedures for crossing over and mutation are applied on “guide haplotypes”.
● Crossover and mutation on permutations are also convenient.
![Page 20: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/20.jpg)
Crossover on permutations
![Page 21: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/21.jpg)
Parameter setting for GAhap
cr crint
mrint selection fitness scaling successful
cases of 20
0.8 0.9 0.9 stochastics shift linear 18
0.2 0.5 0.5 stochastics rank 16
0.2 0.9 0.1 tournament linear 16
0.2 0.1 0.9 uniform rank 15
0.9 0.9 0.9 tournament rank 14
![Page 22: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/22.jpg)
Effect of “cross over” on convergence
cr=0 cr=0.2
![Page 23: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/23.jpg)
GAhap vs. other haplotyping methods
method framework |H| haplotype error rate
switch error rate
HAPLOTYPER BayesianDrichlet prior 33 5.4 3.0
PHASE BayesianPerfect phylogeny 32 5.6 3.1
fastPHASE Simplified PHASE 35 7.3 4.5
2SNP 2 SNPs phasingand MST 40 10.4 5.6
GAhap GA and MP 34 9.7 5.7
Methods have been evaluated with 150 genotypes of GH1 with known phases (Horan et al, 2003)
![Page 24: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/24.jpg)
Generate random haplotype samples under coalescent model
Before second problem
● Simulate a coalescent process.
![Page 25: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/25.jpg)
Generate random haplotype samples under coalescent model
Before second problem
● Determine haplotype frequencies constrained to minor allele frequency.
![Page 26: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/26.jpg)
Second problem; Haplotype block partitioning
● Genome comprises regions with certain boundaries of which haplotypes are transfered without change through generations.
● Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Patil et al. (2001)
● Pattern of Linkage Disequilibrium shows a picture of discrete haplotype blocks over genome.Daly et al.(2001)
● Haplotype blocks arise in the absence of recombination hot spots.Wang et al. (2002)
![Page 27: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/27.jpg)
Blocks of limited haplotype diversity
011010
000101
011010
011010
000101
101000
000101
011010
101000
0000
0000
0000
0110
0110
1000
1100
1100
1100
01010
00110
01010
00110
01011
01011
00010
00010
10011
011010011010011010011010
000101000101000101
101000101000
000000000000
01100110
1000
110011001100
0101001010
0101101011
0011000110
0001000010
10011
![Page 28: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/28.jpg)
An early example of haplotype blocks
courtesy Daly et al. (2001)
● Block structure and common haplotypes on 5q31
![Page 29: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/29.jpg)
Bases of haplotype block definition● Haplotype diversity
– common haplotype–minimum number of SNPs to cover information of
majority of haplotypes, (Patil et al. 2001, Zhang et al. 2002)
● Linkage Disequilibrium– point estimation of LD coefficient, D, r2, D'– interval estimation, (Gabriel et al. 2002)
● Four gamete test, (Wang et al. 2002)
![Page 30: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/30.jpg)
Local partitioning vs. global partitioning methods
● A local partitioning method defines haplotype blocks in a way that boundaries of each block are determined independent from other blocks.
● By local partitioning usually, a series of separated regions on genome, like “islands” forms blocks.
● A global partitioning method defines a whole partitioning for genome rather defining each block independently.
● By Global partitioning usually, genome is “tiled” by blocks tightly placed next to each other.
![Page 31: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/31.jpg)
Methods on haplotype block partitioning
Myers, 05
Zhang, 05
Zhang, 05
Anderson, 03
Wang, 02
Gabriel, 02
![Page 32: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/32.jpg)
Application of haplotype block partitioning● Studies on human origin, history of human migrations
and genetic diversity between races
● Genetic mapping and recognizing recombination hotspots
● Genotyping and phasing
● tagSNP selection
● Disease Association Study
![Page 33: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/33.jpg)
Global haplotype partitioning for maximal associated SNP pairs
Outlines● Categorize SNP pairs into association classes.● Establish a constrained optimization to find blocks
which include the most possible number of “associated” pairs subjected to limited number of “independent” pairs.
● Solve the constrained programming.
![Page 34: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/34.jpg)
Linkage Disequilibrium between two SNPs
A B
a b
A B
a b
P AB=P A . PB
P AB≠P A . PB
● Enough long time after being settled
● no admixture● no selection
● In the presence of crossing over,
● In the absence of crossing over,
![Page 35: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/35.jpg)
Standardized coefficient of LD
0
0
![Page 36: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/36.jpg)
Assessment of LD estimation● Confidence interval, (Gabriel et al, 2002)
– Apply thresholds on confidence interval of |D'|
– Each SNP pair is then categorized into three classes;“strongly associated”, “recombinant” and “uninformative”
● Fisher's exact test and pvalue, (present work)
![Page 37: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/37.jpg)
An association index for SNP pairs based on Fisher's Exact Test
![Page 38: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/38.jpg)
An association index for SNP pairs based on Fisher's Exact Test
● Estimate value of |D'|on given sample.● Compute pvalue of Fisher's exact test.
● Apply thresholds on pvalue results in a three state association index;“associated”, “independent” and “not statistically significant”.
![Page 39: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/39.jpg)
Notion of Fisher's Exact Test
n11
Fex
r 2|D'|
![Page 40: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/40.jpg)
Global haplotype partitioning for maximal associated SNP pairs
● Establish a constrained optimization ...
AB
![Page 41: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/41.jpg)
Global haplotype partitioning for maximal associated SNP pairs
● Establish a constraint optimization ...
![Page 42: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/42.jpg)
Solve the constrained programming● Convert into an unconstrained optimization using a
Lagrange multiplier;
● Given a fixed λ, the partitioning can be obtained via a dynamic programming procedure;
![Page 43: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/43.jpg)
Global haplotype partitioning for maximal associated SNP pairs
![Page 44: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/44.jpg)
Method evaluation● General features of haplotype blocks,
– block length and block distribution– coverage of “common haplotypes”– consistency with LD pattern– the number of minimum tagSNP and coverage– similarity between different partitioning methods
● Robustness of partitioning method.● Performance on identification recombination hotspots● Performance on casecontrol association study.
![Page 45: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/45.jpg)
Evaluation on ENCODE haplotypes● The Encyclopedia of DNA Elements (ENCODE)● Ten regions have been selected by ENCODE project as the
pilot phase to identify the functional elements of human genome.
● There are about 2000 SNPs assayed by the HapMap Project in each ENCODE region (CEU panel).
● We reduced SNPs to those which are commonly ascertained for all three HapMap panels.
● Moreover, we drew out the top 400 SNPs ordered by heterozygosity out of each region.
![Page 46: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/46.jpg)
Courtesy of ENSEMBL for genome annotation
![Page 47: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/47.jpg)
General features of haplotype blocks
![Page 48: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/48.jpg)
Consistency with LD pattern
![Page 49: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/49.jpg)
The number of haplotype tagging SNPs
● The minimum number of htSNPs for each haplotype block has been obtained using htSNPer (Ding et al. 2005)
![Page 50: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/50.jpg)
htSNP coverage
![Page 51: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/51.jpg)
Similarity of blocks between different methods
![Page 52: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/52.jpg)
Robustness of block partitioning
Boundaries of haplotype blocks in 9q34.11 obtained by different methods.
92.0
69.2
99.4
99.7
100
97.6
How many times a certain method reproduce the same boundaries when applied to simulated recombinant haplotypes?
![Page 53: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/53.jpg)
Application to recombination hotspots detection● Generate random haplotype samples under coalescent
model with recombinationwith recombination using msHOT (Hellenthal &
Stephens 2007);● Two simulated haplotype set, each one with 100 samples● Each sample contains 40/100 haplotypes on 300 SNPs● Six 2kb regions are considered as hotspots regions, in
random● Recombination rate is chosen 50400 times higher than
background for hotspots.
![Page 54: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/54.jpg)
Application to recombination hotspots detection
● Total error rate on detection of recombination hotspots
![Page 55: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/55.jpg)
Application to recombination hotspots detection
![Page 56: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/56.jpg)
Application to disease association study● Single site association test;
● Haplotypebased association test;Chisquared test on a hierarchical clustering of Chisquared test on a hierarchical clustering of case/control haplotypescase/control haplotypes
![Page 57: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/57.jpg)
Application to disease association study● Simulate random casecontrol samples under various
multiplicative models;
– GRR1 (first genotype relative risk ratio) = 3 , 5
– DAF (disease allele frequency) = 0.050.15 , 0.200.30
– The sample generator, gs (Li & Chen 2008) simulates the pattern of LD in real haplotypes.
– 500 sample sets of 50 cases / 50 controls for each ENCODE regions have been produced.
– The causative SNP has been removed from samples before assessment
![Page 58: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/58.jpg)
Type I error in the disease association study
![Page 59: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/59.jpg)
Marker selection
● Uniform marker selection; the first SNP out of every k consecutive SNPs is selected as marker.
● Prioritized marker selection;ranking each SNPs based on its “informativeness”, then select markers with respect to the ranking in each haplotype block.
![Page 60: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/60.jpg)
Effect of marker selection on performance of disease gene identification
● Uniform marker selection
![Page 61: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/61.jpg)
Effect of marker selection on performance of disease gene identification
● Prioritized marker selection
![Page 62: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/62.jpg)
Conclusion
Genotype phasing with maximum parsimonyGenotype phasing with maximum parsimony● Incorporating a parametric greedy phasing into GA
made a considerable improvement in results.
● Yet, the search space of most parsimonious haplotypes is rather complicated to be tractable by Genetic Algorithm.
● It seems that the most parsimonious haplotypes are not necessarily near to actual haplotypes, in practice.
![Page 63: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/63.jpg)
ConclusionHaplotype block partitioning using the global Haplotype block partitioning using the global partitioning for maximal associated SNP pairspartitioning for maximal associated SNP pairs
● Methods of pairwise analysis of SNPs find blocks of limited haplotype diversity.
● There is not any general concordance among block boundaries with different methods.
● By permutation resampling it has been shown that the Gabriel's method and its association index are highly robust. Our algorithm is also relatively robust.
![Page 64: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush](https://reader034.vdocuments.site/reader034/viewer/2022042300/5eca7be5e531530dd3201b14/html5/thumbnails/64.jpg)
Conclusion● The global block partitioning methods performed best in
identification of recombination hotspots.● The blockbased association test is considerably more
efficient than the conventional single site association test, in casecontrol study.
● Our block partitioning method performed best accuracy for the casecontrol study, even when a low marker density is available.