targeted data introduction many mapping, alignment and variant calling algorithms most of these...

32
Omixon Workshops Considerations for Analyzing Targeted NGS Data - Introduction Tim Hague, CEO

Upload: baldric-johnston

Post on 18-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Omixon WorkshopsConsiderations for Analyzing Targeted NGS Data - IntroductionTim Hague, CEO

Page 2: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Targeted Data

Page 3: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Introduction

Many mapping, alignment and variant calling algorithms

Most of these have been developed for whole genome sequencing and to some extent population genetic studies

Page 4: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Premise

In contrast, NGS based diagnostics deals with particular genes or mutations of an individual

Different diagnostic targets present specific challenges

Page 5: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Goal

Present analysis issues related to differences in:

Sequencing technologies

Targeting technologies

Target specifics

Pseudogenes and segmental duplication

Page 6: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Roche 454Illumina IonTorrentt

NGS Sequencers

Illumina

Ion Torrent

Roche 454

(SOLiD)

Page 7: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Mind The Gap

Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.

Page 8: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Sequencing Technology

Differences: Homopolymer error rates G/C content errors Read length Sequencing protocols (single vs paired reads)

Page 9: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Targeting Methods

PCR primers (e.g. amplicons) Hybridization probes (e.g. exome kits)

Page 10: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Targeting Technology

Differences: Exact matching regions vs regions with SNPs

Results in: Need for mapping against whole chromosomes to

avoid false positives

Page 11: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Analysis Targets

Differences:

Rate of polymorphism

Repetitive structures

Mutation profiles

G/C content

Single genes vs multi gene complexes

Page 12: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

BRCA1/2 HLA CFTR1/2000 1/29 1/2000

Distributions of insertions and deletions

Distribution of repeat elements

Page 13: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and
Page 14: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Segmental Duplications

Sometimes called Low Copy Repeats (LCRs)

Highly homologous, >95% sequence identity

Rare in most mammals

Comprise a large portion of the human genome (and other primate genomes)

Important for understanding HLA

Page 15: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Many LCRs are concentrated in "hotspots„

Recombinations in these regions are responsible for a wide range of disorders, including: – Charcot-Marie-Tooth syndrome type 1A– Hereditary neuropathy with liability to pressure palsies– Smith-Magenis syndrome– Potocki-Lupski syndrome

Segmental Duplications

Page 16: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Data analysis shouldn’t be like this!

Data Analysis Tools

Differences: Detection rates of complex variants (sensitivity) False positive rates (accuracy) Speed Ease of use

Page 17: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

“Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.”

Mark Yandel in BioIT-World.com, June 8, 2011

Page 18: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Examples

Missing variants

SNPs, a DNP and deletions

Page 19: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and
Page 20: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Identify More Valid Variants

Page 21: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Find Homopolymer Indels

Page 22: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Examples

Coverage differences

Page 23: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

[0-432]

[0-96]

Four Times Exon Coverage

Page 24: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

[0-24]

[0-10]

Higher Exome Coverage

Page 25: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

First Conclusion

Read accuracy is not the limiting factor in accurate

variant analysis

Page 26: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Example - Dense Region of SNPs

Page 27: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Second Conclusion

As variant density increases the performance of most tools

goes down

Page 28: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Variant Calling

There are few popular variant callers: GATK, SAMtools mpileup, VarScan

The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step

These recalibration and realignment steps are highly recommended to be run before any variant call

Deduplication and removing non-primary alignments may also be required

Page 29: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Indel Realigner Problem

Page 30: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Variants That Can be Hard to Find

DNPs TNPs Small indels next to SNPs 30+ bp indels Homopolymer indels Homopolymer indel and SNP together Indels in palindromes Dense regions of variants

Page 31: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Contact

Tim Hague, CEO

Omixon Biocomputing Solutions

[email protected]

+36 70 318 4878

Page 32: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and

Download our Omixon Target™ Evaluation Version

Today

OMIXON.COM