Download - Com par 25jun14
![Page 1: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/1.jpg)
• Xin-zhuan Su• Sittiporn Pattaradilokrat• Sethu Nair • Yanwei Qi• Gordon Bullen
NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich
McGill University
Funding: National Institutes of HealthCanadian Institutes of Health Research
• Philip AwadallaUniversity of Montreal
https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July [email protected]
ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines
Martine Zilversmit
http://www.slideshare.net/zmartine1/com-par-25jun14
![Page 2: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/2.jpg)
ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines
https://github.com/parasite-genomics/Pipelines
• BASH-scripted pipelines
• Accurate variant prediction– SNPs– Small indels – Large indels
(>17bp)– Focused regions of
extreme divergence (35-70% amino acid identity)
• In silico variant validation
![Page 3: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/3.jpg)
Parameters:- Quality Metric and Cutoff- Number of variants per cluster- Maximum distance between variants within a cluster- Maximum distance between smaller clusters to merge
into an HDR
Finding Highly Divergent Regions – HDR Program
VCF File
False Positive Variants
True PositiveVariants
HDR File:- Size of HDR- Position of HDR- Variants Contained
Python - Stand-alone interactive or pipelined
![Page 4: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/4.jpg)
Num
ber o
f Var
iant
s
Position on “Chromosome”
Dye-Terminator Sequenced Variation – 50 basepair Sliding window
Comparing 2 Plasmodium Genomes
![Page 5: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/5.jpg)
Predicted Variants – No filtering Based on Quality Metrics
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
![Page 6: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/6.jpg)
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
![Page 7: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/7.jpg)
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium GenomesFiltering Based on Consensus Quality (FQ) ≤ -100 Cutoff
![Page 8: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/8.jpg)
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Highly-Divergent Regions (HDRs)
![Page 9: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/9.jpg)
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium GenomesQuality ≥ 30 Variants without Consensus Quality ≥ -100
Highly-Divergent Regions (HDRs)
![Page 10: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/10.jpg)
Characteristics of Highly Divergent Regions
33X 44.4% By265 55.6% N67 66.7%
histone acetyltransferase GCN5, putative (GCN5)
RNA-binding protein NOB1, putative
Percent Identity
DNA repair protein, putative
33X 41.4% By265 79.3% N67 51.7%
![Page 11: Com par 25jun14](https://reader035.vdocuments.site/reader035/viewer/2022070510/58a8f5741a28ab837c8b4ddf/html5/thumbnails/11.jpg)
Characteristics of Highly Divergent Regions