com par 25jun14

Post on 19-Feb-2017

132 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

• Xin-zhuan Su• Sittiporn Pattaradilokrat• Sethu Nair • Yanwei Qi• Gordon Bullen

NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich

McGill University

Funding: National Institutes of HealthCanadian Institutes of Health Research

• Philip AwadallaUniversity of Montreal

https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014zmartine@gmail.com

ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines

Martine Zilversmit

http://www.slideshare.net/zmartine1/com-par-25jun14

ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines

https://github.com/parasite-genomics/Pipelines

• BASH-scripted pipelines

• Accurate variant prediction– SNPs– Small indels – Large indels

(>17bp)– Focused regions of

extreme divergence (35-70% amino acid identity)

• In silico variant validation

Parameters:- Quality Metric and Cutoff- Number of variants per cluster- Maximum distance between variants within a cluster- Maximum distance between smaller clusters to merge

into an HDR

Finding Highly Divergent Regions – HDR Program

VCF File

False Positive Variants

True PositiveVariants

HDR File:- Size of HDR- Position of HDR- Variants Contained

Python - Stand-alone interactive or pipelined

Num

ber o

f Var

iant

s

Position on “Chromosome”

Dye-Terminator Sequenced Variation – 50 basepair Sliding window

Comparing 2 Plasmodium Genomes

Predicted Variants – No filtering Based on Quality Metrics

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium GenomesFiltering Based on Consensus Quality (FQ) ≤ -100 Cutoff

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium Genomes

Highly-Divergent Regions (HDRs)

Num

ber o

f Var

iant

s

Position on “Chromosome”

Num

ber o

f Var

iant

s

Position on “Chromosome”

Comparing 2 Plasmodium GenomesQuality ≥ 30 Variants without Consensus Quality ≥ -100

Highly-Divergent Regions (HDRs)

Characteristics of Highly Divergent Regions

33X 44.4% By265 55.6% N67 66.7%

histone acetyltransferase GCN5, putative (GCN5)

RNA-binding protein NOB1, putative

Percent Identity

DNA repair protein, putative

33X 41.4% By265 79.3% N67 51.7%

Characteristics of Highly Divergent Regions

top related