sequencing technologies: overview &...

Sequencing Technologies: Overview & Capabilities

Chris Daum Genomic Technologies Workshop March 22, 2016 [email protected]

Area Leads (3); Resource & NTI manager (1); R&D Project manager (1)

Sequencing Technologies Group

•  Streamlined Process from Sample In to Sequence Data Out •  Perform Process Optimization & Development:

•  New Preps, Applications, Sequencing Technologies •  Continuous Improvement & Lean Manufacturing Six-Sigma

Sample Management

Library Construction

Library QC, qPCR, pooling Sequencing

(5 FTEs) (3 FTEs) (11 FTEs) (5 FTEs)

- Define Project - Obtain Metadata - Product catalog

work order & flow

Project Management Office (PMO)

2

Sequencing Technology

($15M)

3

Sequencing Technologies: Sample to Data

Supported by ITS-GLS LIMS system

Sample QC/ aliqout

Library preparation

Sequencing: Illumina / Pacbio

07/2005 04/2013

454 early access

01/2007 04/2007 10/2007 07/2008 05/2009 12/2009 7/2010 12/2011

Illumina HiSeq 2500 Solexa early access

454 in Production

MegaBase Sanger offline

SOLiD early access

ABI 3730 Sanger reduced

ABI 3730 Sanger offline Solexa in Production

454 Titanium Illumina GAIIx

454 1K Illumina HiSeq 2000

Illumina MiSeq

PacBio RSII

PacBio early access

11/2010 3/2011

454 offline

Sanger Sequencing to Next-Gen Sequencing by Synthesis

4

Staying State of the Art

06/2014

Illumina HiSeq 1T

Oxford Nanopore early access

JGI Yearly Sequence Output

5 As of March 21, 2016

44,000 human genome equivalents!

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

FY2005FY2006FY2007FY2008FY2009FY2010FY2011FY2012FY2013FY2014FY2015FY2016

34 33 39 126 1,0046,041

29,903

56,115

70,863

100,608

143,177

58,172TotalB

ases(G

b)

FYTotalBases(Gb)Sequenced-AllPlaDorms

FYTotal

Illumina HiSeq 1T

Illumina HiSeq 2500

Illumina HiSeq 2000

Illumina NextSeq

500

Illumina MiSeq

PacBio RSII PacBio Sequel

Units 3 3 2 1 5 3 2

Reads (Single-Read

>1,500 Million per Flowcell

200 Million per Flowcell

>1,000 Million per Flowcell

400 Million per Flowcell

>10 Million per Flowcell

0.06 Million per SMRT Cell

0.4 Million per SMRT Cell

Readlength 2 X 150bp Max*

2 X 250bp Max

2 X 150bp Max*

2 X 150 Max 2 X 300bp Max

>12,500bp Avg; >40,000bp Max

12,000bp Avg; >40,000bp Max

Total Bases 500 Gb per Flowcell

130 Gb per Flowcell

350 Gb per Flowcell

>100 Gb per Flowcell

5-20 Gb per Flowcell

>0.4 Gb per SMRT Cell for 2hr runs;

>0.8Gb for 4hr runs

>2.7 Gb per SMRT Cell for 2hr runs; >5.0

Gb for 4hr runs

Run Time 7 Days for 2 X 150

4.5 Days for 2 X 250

16 Days for 2 X 150

1 Days for 2 X 150

2 Days for 2 X 300

0.08-0.12 Days (2-4 hours)

0.08-0.12 Days (2-4 hours)

Applications Primary Sequence

Generator at JGI

Rapid output HiSeq

Supplement / Backup Platform

Rapid mid-range output;

Single Cell

16S/18S iTags, Library

QC, R&D

Assembly improvement, de

novo, SynBio validation,

methylation/ epigenetics

Early Access Testing

JGI Sequencing Platforms Portfolio

6

Major Investment in Automation: Process

Sample QC & Aliquot

Library Construction

Library QC, qPCR, pooling

Sample Archive & Retrieval

Starlet x 2 Sciclone G3 x3 Star x 2

SAM Automated Decapper

Labcyte Echo 5ul PCR

BioMicroLab Volume Check

Maximize Consistency, Throughput and Reliability 7

BioTek Synergy H1 Plate Reader

Automated Library Creation

•  Plate based Automated library preps:

–  Implemented into Production in late 2011

–  3 PerkinElmer Sciclone NGS robots

–  12 production sample prep methods are supported •  >24,000 sample sequencing libraries prepared in last year

•  Supporting Equipment:

8

Sciclone Automated Preps

Preps in Development

•  Illumina Methyl-Seq (bisulfite conversion) prep

•  Illumina 3' RNAseq prep

Supported Preps

•  Illumina gDNA PCR-free WGS Fragment Library prep

•  Illumina TruSeq RNA-seq stranded library preps: •  PolyA selection of mRNA for eukaryotes •  rRNA depletion for microbes & metatranscriptomes

•  Illumina small RNA & miRNA for eukaryotes

•  Illumina iTags (16S proks; 18S euks; Fungal ITS)

•  Illumina Exome Capture (NimbleGen SeqCap) prep for targeted resequencing

•  Illumina NexteraXT prep

•  PacBio DNA 2kb libs, and >10kb libs with enzymatic shearing

9

Continued Growth in Number of Samples Handled

10

A Growing Portfolio of Library Capabilities

Genome: DNA

Transcriptome: RNA

Epigenome: DNA-protein

WGS

LMP

Genome assembly

Structural variation

Genotyping

Comparative genomics

WGS: Tight insert

Pacbio long reads

Stranded RNA-seq

Gene expression

Gene annotation

FFPE/LCM

smRNA

Poly A

rRNA depletion

Iso-seq

Bisulfite-seq

ChIP-seq

Methylation

Chromatin conformation

Protein binding

Gene regulation

ChIA-PET

FAIRE-seq

Chew Yee Ngan 11

Genome Fragments

Long Mate Pairs

Genome assembly

Structural variation

Genetic Diversity

Comparative genomics

complexity

100ng

1ng

Genome: DNA

Tight insert

•  270bp •  500bp •  Low input- Nextera

•  Tight insert 400bp •  Tight insert 800bp

•  2.5kb •  4kb •  8kb

•  Pacbio 2kb •  Pacbio 3kb •  Pacbio >10kb

•  Low input 10kb •  Pacbio 20kb •  Oxford Nanopore

A Growing Portfolio of Library Capabilities

Example: DNA Products

Genome Fragments

Genome Fragments

Long Reads

Long Mate Pairs

Chew Yee Ngan 12

RNA Sample Prep: Gene expression •  RNA-seq

•  Small RNA-seq

13

RNA amount

PCR amplification

10ng 100ng

Poly A selection rRNA depletion (probes)

8 15

1ug

10

RNA-seq: Coverage Reproducibility

Pearson correlation = 0.9992

Strandedness

Eukaryotes: 20-40bp Prokaryotes: 50-150bp

Novel miRNA

Chew Yee Ngan

Epigenomic: DNA modifications

14

5 Methyl C Bisulfite sequencing

Methyl A Pacbio sequencing

Unmodified DNA template

Time

Modified DNA template

Delay in base incorporation

opposite methyl adenine.

Time

Stephen Mondo’s Presentation

Epigenomic: Chromatin structure

15

Protein binding sites Histone modifications Regulatory elements/ open chromatin:

FAIRE-seq

ChIP-seq

Genome Res. Jun 2007; 17(6): 877–885 Brief Bioinform (2010) doi: 10.1093/bib/bbq068

•  3’ RNA-seq

•  ATAC-seq (Assay of Transposase Accessible Chromatin) –  Identification of genome wide regulatory elements

New Technologies

16

•  Reduced sample requirement •  Organism dependent

– requires customization

•  Gene expression study •  Reduced sequencing cost Total RNA

AAAAA 0.00

0.02

0.04

0 20 40 60 80 100

Coverage across transcripts

ATAC ChIP

1214 | VOL.10 NO.12 | DECEMBER 2013 | NATURE METHODS

ARTICLES

ratio similar to that of DNase-seq, whose data were generated from ~3–5 orders of magnitude more cells13. Peak intensities were highly reproducible between technical replicates (R = 0.98) and highly correlated between ATAC-seq and two different sources of DNase-seq data (R = 0.79 and R = 0.83; Supplementary Fig. 1), and the majority of reads within peaks came from intersections of DNase and ATAC-seq peaks (Supplementary Fig. 2). We compared our data to DNase-hypersensitive sites identified in ENCODE DNase-seq data: receiver operating characteristic curves demonstrated a similar sensitivity and specificity for ATAC-seq and DNase-seq (Supplementary Fig. 3). ATAC-seq peak intensi-ties also correlated well with markers of active chromatin and not with transposase sequence preference (Supplementary Figs. 4 and 5). Highly sensitive open-chromatin detection was main-tained even when using 5,000 or 500 human nuclei as starting material (Supplementary Figs. 3 and 6), although sensitivity was diminished for smaller numbers of input material.

ATAC-seq insert sizes disclose nucleosome positionsWe found that ATAC-seq paired-end reads produced detailed information about nucleosome packing and positioning. The insert size distribution of sequenced fragments from human chro-matin had clear periodicity of approximately 200 bp, suggesting many fragments are protected by integer multiples of nucleo-somes (Fig. 2a). This fragment size distribution also showed clear periodicity equal to the helical pitch of DNA11. By parti-tioning the insert size distribution according to functional classes of chromatin as defined by previous models17 and normalizing to the global insert distribution (Online Methods), we observed clear class-specific enrichments across this insert size distribution

(Fig. 2b). This result demonstrated that these functional states of chromatin have an accessibility fingerprint that can be read out with ATAC-seq. The differen-tial fragmentation patterns were consist-ent with the putative functional state of these classes: CCCTC-binding factor (CTCF)-bound regions were enriched for short fragments of DNA, whereas transcription start sites (TSSs) were dif-ferentially depleted for mono-, di- and

trinucleosome–associated fragments. Transcribed and promoter-flanking regions were enriched for longer multinucleosomal frag-ments, a result suggesting that such regions represent a form of chromatin that is less accessible than that of TSSs. Finally, prior studies have shown that certain DNA sequences are refractory to

Amplifyand

sequence

Closed chromatin

Open chromatin

Tn5 transposome

a b

c Chr19 (q13.12) 19p13.3 13.1219p13.2 19p13.11 19p12 19q12 q13.11 3.12q 19q13.2 q13.32 13.41 q13.42 q13.43q13.33

ATAC-seq(50,000 cellsper replicate)

DNase HS ENCODE/Duke

FAIRE-seq ENCODE/UNC

H3K27acH3K4me3

H3K4me1CTCF

ScaleChr19: 50 kb hg19

36,150,000 36,200,000 36,250,000

HAUS5RBM42

ETV2

COX6B1

UPK1AUPK1A-AS1

ZBTB32MLL4

IGFLR1

U2AF1L4U2AF1L4

PSENENLIN37

HSPB6C19orf55

ARHGAP33

ARHGAP33

1

01

01

0

Day 1 2 3

Starting material Preparation time

4

FAIRE-seq

DNase-seq

ATAC-seq

107105104 108106103No. of cells 102

1

0

ATAC-seq(500 cells)

1

Figure 1 | ATAC-seq probes open-chromatin state. (a) ATAC-seq reaction schematic. Transposase (green), loaded with sequencing adaptors (red and blue), inserts only in regions of open chromatin (between nucleosomes in gray) and generates sequencing-library fragments that can be PCR-amplified. (b) Approximate reported input material and sample preparation time requirements for genome-wide methods of open-chromatin analysis. (c) Comparison of ATAC-seq to other open-chromatin assays at a locus in GM12878 lymphoblastoid cells. The lower ATAC-seq track was generated from 500 FACS-sorted cells. Bottom, composite locations of CTCF and histone modifications associated with active enhancers and promoters. DNase-seq (Duke) and FAIRE-seq (University of North Carolina, UNC) data were from the indicated ENCODE production centers.

a

bCTCF

TSS

Weak enhancer

Promoter flanking

Transcribed

Repressed

50 750 –0.1

5 0

0.15

Fragment length (bp)

Chromatin state:

Enhancer

400225 575

Enrichment

Trimer

DNA pitch(~10.5 bp)

Nucleosome

Single nucleosomeDimer

TetramerPentamer

Hexamer

1,200800400Fragment length (bp)

6

4

2

01,0008006004002000

Fragment length (bp)

810–3

10–5

10–4

0

Nor

mal

ized

rea

d de

nsity

× 1

0–3

Nor

m. r

ead

dens

ity

Figure 2 | ATAC-seq provides genome-wide information on chromatin compaction. (a) ATAC-seq fragment sizes generated from GM12878 nuclei. Inset, log-transformed histogram shows clear periodicity persists to six nucleosomes. (b) Normalized read enrichments for seven classes of chromatin state previously defined17.

5’ transcript 3’

Nature Methods 10, 1213–1218 (2013)

Major User Request: Decrease Sample Amount Requirements

Input; cost Throughput; consistency

The need for cheaper nano-scale sample prep

Microfluidic platform

Fluidigm C1™

Illumina Neoprep

Closed Commercial Microfluidic Platforms

2014-2016 ETOP Blainey Lab MIT

GenCell CLiC

Echo labcyte

17

•  Oxford Nanopore Technology – find out more from Juna Lee next!

Emerging Sequencing Technologies

18

sequencing technologies: overview &...

Documents