intro to next generation sequencing

46
DM Church Last Updated: 7 May 20 Intro to Next Generation Sequencing

Upload: harper

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Intro to Next Generation Sequencing. Nick Loman and James Hadfield. http:// omicsmaps.com /. Koboldt et al., 2010 (Figure 3). Bench work to build libraries and sequence. Clean up and QA reads. Alignments to Genome or Transcriptome. Analysis of Alignments. Koboldt et al., 2010. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Intro to Next Generation Sequencing

Page 2: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

http://omicsmaps.com/ Nick Loman and James Hadfield

Page 3: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Page 4: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Koboldt et al., 2010 (Figure 3)

Page 5: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Page 6: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Bench work to build libraries and

sequence

Clean up and QA reads

Alignments to Genome or

Transcriptome

Analysis of Alignments

Page 7: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Koboldt et al., 2010

Sample Contamination

Library chimeras

Sample mix-upsTumor-normal

switches

Run quality

Page 8: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Koboldt et al, (Fig 4A)

Page 9: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Page 10: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Chor et al., 2009

Page 11: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

CCL Bio

Page 12: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

GCTACGGCATTCAGGCATCAGGCATTAGCAGGGCATTCAGGGATCAGGCATTAGC->

<-CATGGCATTCAGGGATCAGGCATT<-GCCATGGCATTCAGGGATCAGGC

CATTCAGGGATCAGGCATTAGCAG->

GGCATTCAGGGATCAGGCATTAGC->CATTCAGGGATCAGGCATTAGCAG->

GGCATTCAGGGATCAGGCATT-><-GGATCAGGCATTAGCAG<-GATCAGGCATTAGCAG<-GGATCAGGCATTAGCAG

Page 13: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

High Coverage: qualities may not be needed

Page 14: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Low Coverage: qualities are important

Page 15: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Custodia-Lora et al., 2003

Page 16: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

FASTQ Example

FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.

For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example,

Illumina stores quality scores ranging from 0-62;Sanger quality scores range from 0-93.

Solexa quality scores have to be converted to PHRED quality scores.

Page 17: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

SAM (Sequence Alignment/Map)

• It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format– SAM is the output of aligners that map reads to a

reference genome– Tab delimited w/ header section and alignment

section• Header sections begin with @ (are optional)• Alignment section has 11 mandatory fields

– BAM is the binary format of SAM

http://samtools.sourceforge.net/

Page 18: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012http://samtools.sourceforge.net/SAM1.pdf

Mandatory Alignment Fields

Page 19: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012http://samtools.sourceforge.net/SAM1.pdf

Alignment Examples

Alignments in SAM format

Page 20: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

chr1 86114265 86116346 nsv433165chr2 1841774 1846089 nsv433166chr16 2950446 2955264 nsv433167chr17 14350387 14351933 nsv433168chr17 32831694 32832761 nsv433169chr17 32831694 32832761 nsv433170chr18 61880550 61881930 nsv433171

chr1 16759829 16778548 chr1:21667704 270866 -chr1 16763194 16784844 chr1:146691804 407277 +chr1 16763194 16784844 chr1:144004664 408925 -chr1 16763194 16779513 chr1:142857141 291416 -chr1 16763194 16779513 chr1:143522082 293473 -chr1 16763194 16778548 chr1:146844175 284555 -chr1 16763194 16778548 chr1:147006260 284948 -chr1 16763411 16784844 chr1:144747517 405362 +

Valid BED files

Page 21: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

GTF

Page 22: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

##gff-version 3##gvf-version 1.02##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090##genome-build NCBI MGSCv36##assembly-name MGSCv36##assembly-accession GCF_000001635.15##file-date 2011-11-18# Study_accession: Combined studies on MGSCv36# Display_name: Combined studies on MGSCv36# Study_description: Combined studies on MGSCv36chr1 dbVar copy_number_variation 90044442 90114410 . . .

ID=nsv433533;Name=nsv433533;Start_range=.,90044442;End_range=90114410,.chr4 dbVar copy_number_variation 121483931 121646639 .

. .ID=nsv433534;Name=nsv433534;Start_range=.,121483931;End_range=121646639,.chr9 dbVar copy_number_variation 109128634 109146964 .

. .ID=nsv433535;Name=nsv433535;Start_range=.,109128634;End_range=109146964,.chr17 dbVar copy_number_variation 30240627 30614866 . . .

ID=nsv433536;Name=nsv433536;Start_range=.,30240627;End_range=30614866,.chr17 dbVar copy_number_variation 30983722 31036099 . . .

ID=nsv433537;Name=nsv433537;Start_range=.,30983722;End_range=31036099,.chr17 dbVar copy_number_variation 34907088 34962504 . . .

ID=nsv433538;Name=nsv433538;Start_range=.,34907088;End_range=34962504,.

GVF format

Page 23: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

http://www.ncbi.nlm.nih.gov/dbvar

http://www.ebi.uk/dgva

http://www.ncbi.nlm.nih.gov/snp

Derived data

Page 24: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Derived data

Page 25: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Actual data

Page 26: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012Oct-00 Feb-02 Jun-03 Nov-04 Mar-06 Aug-07 Dec-08 May-10 Sep-11

100000000

1000000000

10000000000

100000000000

1000000000000

10000000000000

100000000000000

1000000000000000 Trace and SRA Holdings

TraceArchive Bases

SRA Bases

SRA Bytes

Getting exponential growth under control

Page 27: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Trace Organizationseq1

seq2

FASTAQualityChromatogramExperimental infoSample

FASTAQualityChromatogramExperimental infoSample

SRA Organization

Experiments

Samples

Sequences and Qualities

Page 28: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012Feb-08 Sep-08 Mar-09 Oct-09 May-10 Nov-10 Jun-11 Dec-110

1

2

3

4

5

6

7

8

9

10

Bytes per base in SRA

CummulitiveIncrementalMoving Av-erage

Era of NGS Explosion FASTQ Era Bits/Base Era

As of April 10, 2012 SRA contains less bytes then bases

Page 29: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

New CycleDecision Circle

What data series to

store

Redundancy removal

Normalization

Lossy vs Lossless

Compression tuning

Practical Application

BAM and similar formats containing both raw

reads and alignments become primary output

of raw sequencing

Increases the number of data

series

Compression By Reference

reduces sizes of other data series

New sets of tradeoffs

New compression algorithms

Page 30: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Analyzing New Compression MethodData from 1000 Genome Project

• All available combinations of samples, platforms, and aligners

• 3114 files• 27 Tb of disk space after compression

BAMs from 1000 Genome Project

• Names are dropped after restoring mates• Only sequencing quality score is saved• None of non-redundant optional tags are preserved

BAM treatment

• Occasional alignments to stretches of Ns on the reference and beyond the reference were converted to unaligned

• Different PCR duplicate flags for mates

Correction of BAM

inconsistencies

Page 31: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Changes To SRA Run Browser

Page 32: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

http://aws.amazon.com/datasets/4383

Page 33: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

https://main.g2.bx.psu.edu/

Page 34: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

http://www.genomespace.org/

Page 35: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Science 1 July 2011:Vol. 333 no. 6038 pp. 53-58DOI: 10.1126/science.1207018

Page 36: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Li et al., 2011, Figure 1

Page 37: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Li et al., 2011Fig. 2

Page 38: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Kleinman et al., 2012Fig 1

Page 39: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Kleinman et al., 2012Table 1

Page 40: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Lin et al., 2012Fig 1

Page 41: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Lin et al., 2012Fig 2

Page 42: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Pickrell et al., 2012Fig 1

Page 43: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Li et al, 2012Fig 1

Page 44: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Li et al., 2012Fig 2

Page 45: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Li et al., 2012Fig 3

Page 46: Intro to  Next Generation Sequencing

DM Church Last Updated: 7 May 2012

Li et al, 2012Fig 4