cptr title slide title slide reseqtb data ... (mbtc) above acceptable threshold? ... -custom loci...

31
CPTR title slide ReSeqTB data platform pipeline threshold values Jamie Posey, PhD CDC

Upload: trinhtram

Post on 14-May-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

CPTR title slide

ReSeqTB data platform

pipeline threshold values

Jamie Posey, PhD

CDC

Page 2: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Pipeline Scheme

Page 3: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Pipeline flowchart

Page 4: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Pipeline flowchart

Page 5: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Pipeline flowchart

Page 6: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Key steps on pipeline

• Input data validation & QC

• Species specificity check

• Sequence reads mapping & refinement

• Variant calling

• Functional Annotation & Lineage Analysis

Page 7: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Input data validation & QC

Page 8: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Quality Scores

QUALITY SCORE ACCURACY (%)

Q10 90

Q20 99

Q30 99.9

Q40 99.99

Page 9: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Input data validation & QC

• Fastq format files

-From next-generation Sequencing platforms

-specifically Illumina sequencing

• FastQValidator Version 1.0.5

Are Sequence reads in fastq format or not?

Page 10: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Input data validation & QC

Page 11: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Input data validation & QC

• Prinseq-lite.pl Version 1.0.5

- Trim reads based on quality Threshold

QC Threshold: Q20 Average Read Sequence Quality

Page 12: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Species Specificity check

Page 13: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Species Specificity check

• Kraken version 0.10.5

-Is the percentage of reads mapping to Mycobacterium tuberculosis Complex(MBTC) above acceptable threshold?

QC Threshold : Percent of reads mapping to MBTC -> 90%

Page 14: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Species Specificity check

Page 15: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Sequencing reads mapping & refinement

Page 16: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Sequencing reads mapping & refinement

• Reference Genome: H37Rv (NC_000962.3)

• BWA MEM: Version 0.7.12

- Mapping Tool

• QC: Qualimap Version 2.1

- Output: Quality Report, inferring mapping

Page 17: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Sequencing reads mapping & refinement

Page 18: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Sequencing reads mapping & refinement

• Removing duplicate reads

PICARD tools Version 1.134

• Cleaning Indels & recalibration

GATK Version 3.4.0

• Calculation of coverage statistics

Page 19: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Variant Calling

Page 20: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Variant Calling

• Samtools & Bcftools Version 1.2

-QC Threshold : Q20 Minimum base call quality

-QC Threshold: Q20: Minimum mapping quality

-QC Threshold : Minimum read depth >/= 10X

-QC Threshold: SNP clusters; 3 SNPs in 10 nucleotide bases

Page 21: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Variant Calling

Page 22: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Pipeline flowchart

FFILTER VCF FileCustom Script

Functional Annotation & Lineage AnalysisSnpEff Ver. 4.1 & custom Script

Mapping to ReseqTB Database

Input: VCF file (Raw)

Filtered VCF file

Output: Annotation Report & Lineage Report

Functional Annotation and Lineage Analysis

Page 23: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Functional Annotation & Lineage Analysis

• Filtering output VCF file

-Custom loci bed list & vcftools Version 0.1.126

• Initial annotation

-SnpEff Version 4.1

• Reformatting annotation and Lineage analysis

-Custom Script

Page 24: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Annotation Report

Page 25: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Lineage Report

Page 26: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Summary of UVP analysis

Total Isolates Analyzed : 3717

Number passed all checks: 3570

Total failed QC: 147

- Failed Kraken specificity: 67

- Flagged for multiple rrs/rrl mutations : 76

- Mixed infection : 4

Page 27: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Distribution of MTBC major lineages in dataset

Page 28: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Phylogenetic representation of Isolates in dataset

BovisEast AsianEast African Indian

West African L5

Indo-Oceanic

West African L6Euro American

Page 29: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Antibiotic resistance profile across major lineages

Page 30: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Summary

• The Unified variant pipeline is very comprehensive, includes additional genomic data analysis steps (Species and lineage specificity, custom annotations)

• Applies current versions of bioinformatics tools to set quality thresholds at all stops on the pipeline to ensure confidence in variant calls.

• Annotation results validation with results from a number of other variant calling pipelines, including PhyReeSE (Silke et al 2015) shows agreement across most variant positions.

Page 31: CPTR title slide title slide ReSeqTB data ... (MBTC) above acceptable threshold? ... -Custom loci bed list & vcftools Version 0.1.126 • Initial annotation-SnpEff Version 4.1

Thank You!