pure parsimony
DESCRIPTION
How Accurate is Pure Parsimony Haplotype Inferencing? Sharlee Climer Department of Computer Science and Engineering Department of Biology Washington University in Saint Louis [email protected] www.climer.us Joint work with Weixiong Zhang and Gerold Jaeger. Pure Parsimony. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/1.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
How Accurate is Pure Parsimony Haplotype Inferencing?
Sharlee ClimerDepartment of Computer Science and Engineering
Department of BiologyWashington University in Saint Louis
Joint work with Weixiong Zhang and Gerold Jaeger
![Page 2: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/2.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Pure Parsimony
• Pure Parsimony Haplotype Inferencing (PPHI)– Find smallest set of unique haplotypes that can
resolve a set of genotypes
• Suggested by Earl Hubbell in 2000• Cast as an Integer Linear Program (IP) by
Dan Gusfield [CPM’03]
• Great research interest
![Page 3: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/3.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Overview
• Biological forces
• Haplotypes with low frequency
• Define haplotype classes
• Data sets
• Characteristics of real data
![Page 4: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/4.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 5: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/5.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 6: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/6.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 7: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/7.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 8: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/8.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 9: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/9.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 10: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/10.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
![Page 11: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/11.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological forces
• Relatively few unique haplotypes
• Subset of haplotypes with low frequency
• Problems for PPHI– Large number of optimal solutions– True biological solution might not be
parsimonious
• What are structural characteristics of optimal solutions?
![Page 12: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/12.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Classes of haplotypes
• Set of possible haplotypes is exponentially large• Partition similar to Traveling Salesman Problem• Backbone haplotypes
– Appear in every optimal solution
• Fat haplotypes– Do not appear in any optimal solution
• Fluid haplotypes– Appear in some, but not all, optimal solutions
![Page 13: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/13.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Backbone haplotypes
• Implicit backbones– All haplotypes that resolve unambiguous
genotypes
• Explicit backbones– Can identify by solving at most one IP for each
haplotype in solution that isn’t implicit backbone
![Page 14: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/14.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Backbone haplotypes
h3 h7 h15 h27 h39 h50 h55 h79 h91
bb bb bb bb
![Page 15: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/15.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Backbone graph
![Page 16: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/16.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Backbone graph
![Page 17: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/17.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
An optimal solution
![Page 18: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/18.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Low frequency haplotype
![Page 19: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/19.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Low frequency haplotype
![Page 20: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/20.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Low frequency haplotype
![Page 21: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/21.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Low frequency haplotype
![Page 22: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/22.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Data sets
• 7 true haplotype data sets– Orzack et al.[Genetics, 2003]
• 80 genotypes
• 9 sites
• ApoE
– Andres et al. [Genet. Epi., in press]
• 6 sets of complete data
• 39 genotypes
• 5 to 47 sites
• KLK13 and KLK14
![Page 23: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/23.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Data sets
• HapMap data [Nature 2003, 2005]
– Phase unknown– Random instance generator– 20 unique genotypes – 20 sites– Three populations
• CEU• YRI• JPT+CHB
– 22 chromosomes
![Page 24: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/24.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Size of haplotype backbonePercentage of haplotypes that are backbones
0
0.2
0.4
0.6
0.8
1
1.2
BF
HG
BV
ceu2:
ceu5:
ceu8:
ceu11:
ceu14:
ceu17:
ceu20:
yri3:
yri6:
yri9:
yri12:
yri15:
yri18:
yri21:
jpt+
chb1:
jpt+
chb4:
jpt+
chb7:
jpt+
chb10:
jpt+
chb13:
jpt+
chb16:
jpt+
chb19:
jpt+
chb22:
Implicit backbones
hBBTotal
![Page 25: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/25.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Number of fluid haplotypes in each solution
0
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
![Page 26: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/26.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Number of optimal solutions
1
10
100
1000
1 2 3 45 6 7 8910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
![Page 27: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/27.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Number of fluid haplotypes and solutions
0
2
4
6
8
10
12
14
16
18
20
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
Nu
mb
er
of
flu
id h
ap
loty
pes r
eq
uir
ed
0
200
400
600
800
1000
1200
Nu
mb
er
of
so
luti
on
s
# fluid haplotypes # of solutions
![Page 28: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/28.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological correctness
Data set
# gen. # sites # BB
hap.
#fluid hap.
# opt. sols.
Avg. distance to real
A 30 9 15 0 1 8
B 10 5 7 0 1 0
C 18 17 9 3 16 7.5
D 10 8 6 1 4 2.5
E 23 26 9 7 >1000 4.33
F 26 22 12 5 630 28.24
G 35 47 12 16 >1000 10.95
![Page 29: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/29.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological correctness
Data set Parsimony # of haplotypes
True # of haplotypes
A 15 17
B 7 7
C 12 12
D 7 7
E 16 16
F 17 18
G 28 32
![Page 30: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/30.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Biological correctness
• Accuracy of backbone haplotypes
• Two data sets (F and G) had errors – One parsimony backbone haplotype not in real
solution
![Page 31: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/31.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Number of solutions vs. number of genotypes
0
2
4
6
8
10
12
14
16
18
nu
mb
er o
f h
aplo
typ
es
0
100
200
300
400
500
600
700
nu
mb
er o
f o
pti
mal
so
luti
on
s
# of haplotypes
# of solutions
![Page 32: Pure Parsimony](https://reader035.vdocuments.site/reader035/viewer/2022062309/568145fb550346895db30677/html5/thumbnails/32.jpg)
RECOMB SNPs Workshop/Jan 28, 2007
Conclusions
• Biological forces tend to minimize cardinality, but also create low frequency haplotypes
• Low frequency in unique genotypes might not be low frequency in full set
• Low frequency haplotypes– Large number of optimal solutions
– True solution not necessarily parsimonious
– Combinatorial nature can lead to errors in backbones
• Parsimony combined with other biological clues