com par 25jun14

11
Xin-zhuan Su Sittiporn Pattaradilokrat Sethu Nair Yanwei Qi Gordon Bullen NIH/ NIAID – Malaria Functional Genomics Section Sebastian Gurevich McGill University Funding: National Institutes of Health Canadian Institutes of Health Research Philip Awadalla University of Montreal https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014 [email protected] ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines Martine Zilversmit ://www.slideshare.net/zmartine1/com-par-25jun14

Upload: martine-zilversmit

Post on 19-Feb-2017

132 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Com par 25jun14

• Xin-zhuan Su• Sittiporn Pattaradilokrat• Sethu Nair • Yanwei Qi• Gordon Bullen

NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich

McGill University

Funding: National Institutes of HealthCanadian Institutes of Health Research

• Philip AwadallaUniversity of Montreal

https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July [email protected]

ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines

Martine Zilversmit

http://www.slideshare.net/zmartine1/com-par-25jun14

Page 2: Com par 25jun14

ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines

https://github.com/parasite-genomics/Pipelines

• BASH-scripted pipelines

• Accurate variant prediction– SNPs– Small indels – Large indels

(>17bp)– Focused regions of

extreme divergence (35-70% amino acid identity)

• In silico variant validation

Page 3: Com par 25jun14

Parameters:- Quality Metric and Cutoff- Number of variants per cluster- Maximum distance between variants within a cluster- Maximum distance between smaller clusters to merge

into an HDR

Finding Highly Divergent Regions – HDR Program

VCF File

False Positive Variants

True PositiveVariants

HDR File:- Size of HDR- Position of HDR- Variants Contained

Python - Stand-alone interactive or pipelined

Page 4: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Dye-Terminator Sequenced Variation – 50 basepair Sliding window

Comparing 2 Plasmodium Genomes

Page 5: Com par 25jun14

Predicted Variants – No filtering Based on Quality Metrics

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Page 6: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff

Page 7: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium GenomesFiltering Based on Consensus Quality (FQ) ≤ -100 Cutoff

Page 8: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Highly-Divergent Regions (HDRs)

Page 9: Com par 25jun14

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium GenomesQuality ≥ 30 Variants without Consensus Quality ≥ -100

Highly-Divergent Regions (HDRs)

Page 10: Com par 25jun14

Characteristics of Highly Divergent Regions

33X 44.4% By265 55.6% N67 66.7%

histone acetyltransferase GCN5, putative (GCN5)

RNA-binding protein NOB1, putative

Percent Identity

DNA repair protein, putative

33X 41.4% By265 79.3% N67 51.7%

Page 11: Com par 25jun14

Characteristics of Highly Divergent Regions