Download - Com par 25jun14

Transcript
Page 1: Com par 25jun14

• Xin-zhuan Su• Sittiporn Pattaradilokrat• Sethu Nair • Yanwei Qi• Gordon Bullen

NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich

McGill University

Funding: National Institutes of HealthCanadian Institutes of Health Research

• Philip AwadallaUniversity of Montreal

https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July [email protected]

ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines

Martine Zilversmit

http://www.slideshare.net/zmartine1/com-par-25jun14

Page 2: Com par 25jun14

ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines

https://github.com/parasite-genomics/Pipelines

• BASH-scripted pipelines

• Accurate variant prediction– SNPs– Small indels – Large indels

(>17bp)– Focused regions of

extreme divergence (35-70% amino acid identity)

• In silico variant validation

Page 3: Com par 25jun14

Parameters:- Quality Metric and Cutoff- Number of variants per cluster- Maximum distance between variants within a cluster- Maximum distance between smaller clusters to merge

into an HDR

Finding Highly Divergent Regions – HDR Program

VCF File

False Positive Variants

True PositiveVariants

HDR File:- Size of HDR- Position of HDR- Variants Contained

Python - Stand-alone interactive or pipelined

Page 4: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Dye-Terminator Sequenced Variation – 50 basepair Sliding window

Comparing 2 Plasmodium Genomes

Page 5: Com par 25jun14

Predicted Variants – No filtering Based on Quality Metrics

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Page 6: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff

Page 7: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium GenomesFiltering Based on Consensus Quality (FQ) ≤ -100 Cutoff

Page 8: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Highly-Divergent Regions (HDRs)

Page 9: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium GenomesQuality ≥ 30 Variants without Consensus Quality ≥ -100

Highly-Divergent Regions (HDRs)

Page 10: Com par 25jun14

Characteristics of Highly Divergent Regions

33X 44.4% By265 55.6% N67 66.7%

histone acetyltransferase GCN5, putative (GCN5)

RNA-binding protein NOB1, putative

Percent Identity

DNA repair protein, putative

33X 41.4% By265 79.3% N67 51.7%

Page 11: Com par 25jun14

Characteristics of Highly Divergent Regions


Top Related