sequencing technologies: overview &...
TRANSCRIPT
Sequencing Technologies: Overview & Capabilities
Chris Daum Genomic Technologies Workshop March 22, 2016 [email protected]
Area Leads (3); Resource & NTI manager (1); R&D Project manager (1)
Sequencing Technologies Group
• Streamlined Process from Sample In to Sequence Data Out • Perform Process Optimization & Development:
• New Preps, Applications, Sequencing Technologies • Continuous Improvement & Lean Manufacturing Six-Sigma
Sample Management
Library Construction
Library QC, qPCR, pooling Sequencing
(5 FTEs) (3 FTEs) (11 FTEs) (5 FTEs)
- Define Project - Obtain Metadata - Product catalog
work order & flow
Project Management Office (PMO)
2
Sequencing Technology
($15M)
3
Sequencing Technologies: Sample to Data
Supported by ITS-GLS LIMS system
Sample QC/ aliqout
Library preparation
Sequencing: Illumina / Pacbio
07/2005 04/2013
454 early access
01/2007 04/2007 10/2007 07/2008 05/2009 12/2009 7/2010 12/2011
Illumina HiSeq 2500 Solexa early access
454 in Production
MegaBase Sanger offline
SOLiD early access
ABI 3730 Sanger reduced
ABI 3730 Sanger offline Solexa in Production
454 Titanium Illumina GAIIx
454 1K Illumina HiSeq 2000
Illumina MiSeq
PacBio RSII
PacBio early access
11/2010 3/2011
454 offline
Sanger Sequencing to Next-Gen Sequencing by Synthesis
4
Staying State of the Art
06/2014
Illumina HiSeq 1T
Oxford Nanopore early access
JGI Yearly Sequence Output
5 As of March 21, 2016
44,000 human genome equivalents!
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
FY2005FY2006FY2007FY2008FY2009FY2010FY2011FY2012FY2013FY2014FY2015FY2016
34 33 39 126 1,0046,041
29,903
56,115
70,863
100,608
143,177
58,172TotalB
ases(G
b)
FYTotalBases(Gb)Sequenced-AllPlaDorms
FYTotal
Illumina HiSeq 1T
Illumina HiSeq 2500
Illumina HiSeq 2000
Illumina NextSeq
500
Illumina MiSeq
PacBio RSII PacBio Sequel
Units 3 3 2 1 5 3 2
Reads (Single-Read
>1,500 Million per Flowcell
200 Million per Flowcell
>1,000 Million per Flowcell
400 Million per Flowcell
>10 Million per Flowcell
0.06 Million per SMRT Cell
0.4 Million per SMRT Cell
Readlength 2 X 150bp Max*
2 X 250bp Max
2 X 150bp Max*
2 X 150 Max 2 X 300bp Max
>12,500bp Avg; >40,000bp Max
12,000bp Avg; >40,000bp Max
Total Bases 500 Gb per Flowcell
130 Gb per Flowcell
350 Gb per Flowcell
>100 Gb per Flowcell
5-20 Gb per Flowcell
>0.4 Gb per SMRT Cell for 2hr runs;
>0.8Gb for 4hr runs
>2.7 Gb per SMRT Cell for 2hr runs; >5.0
Gb for 4hr runs
Run Time 7 Days for 2 X 150
4.5 Days for 2 X 250
16 Days for 2 X 150
1 Days for 2 X 150
2 Days for 2 X 300
0.08-0.12 Days (2-4 hours)
0.08-0.12 Days (2-4 hours)
Applications Primary Sequence
Generator at JGI
Rapid output HiSeq
Supplement / Backup Platform
Rapid mid-range output;
Single Cell
16S/18S iTags, Library
QC, R&D
Assembly improvement, de
novo, SynBio validation,
methylation/ epigenetics
Early Access Testing
JGI Sequencing Platforms Portfolio
6
Major Investment in Automation: Process
Sample QC & Aliquot
Library Construction
Library QC, qPCR, pooling
Sample Archive & Retrieval
Starlet x 2 Sciclone G3 x3 Star x 2
SAM Automated Decapper
Labcyte Echo 5ul PCR
BioMicroLab Volume Check
Maximize Consistency, Throughput and Reliability 7
BioTek Synergy H1 Plate Reader
Automated Library Creation
• Plate based Automated library preps:
– Implemented into Production in late 2011
– 3 PerkinElmer Sciclone NGS robots
– 12 production sample prep methods are supported • >24,000 sample sequencing libraries prepared in last year
• Supporting Equipment:
8
Sciclone Automated Preps
Preps in Development
• Illumina Methyl-Seq (bisulfite conversion) prep
• Illumina 3' RNAseq prep
Supported Preps
• Illumina gDNA PCR-free WGS Fragment Library prep
• Illumina TruSeq RNA-seq stranded library preps: • PolyA selection of mRNA for eukaryotes • rRNA depletion for microbes & metatranscriptomes
• Illumina small RNA & miRNA for eukaryotes
• Illumina iTags (16S proks; 18S euks; Fungal ITS)
• Illumina Exome Capture (NimbleGen SeqCap) prep for targeted resequencing
• Illumina NexteraXT prep
• PacBio DNA 2kb libs, and >10kb libs with enzymatic shearing
9
Continued Growth in Number of Samples Handled
10
A Growing Portfolio of Library Capabilities
Genome: DNA
Transcriptome: RNA
Epigenome: DNA-protein
WGS
LMP
Genome assembly
Structural variation
Genotyping
Comparative genomics
WGS: Tight insert
Pacbio long reads
Stranded RNA-seq
Gene expression
Gene annotation
FFPE/LCM
smRNA
Poly A
rRNA depletion
Iso-seq
Bisulfite-seq
ChIP-seq
Methylation
Chromatin conformation
Protein binding
Gene regulation
ChIA-PET
FAIRE-seq
Chew Yee Ngan 11
Genome Fragments
Long Mate Pairs
Genome assembly
Structural variation
Genetic Diversity
Comparative genomics
complexity
100ng
1ng
Genome: DNA
Tight insert
• 270bp • 500bp • Low input- Nextera
• Tight insert 400bp • Tight insert 800bp
• 2.5kb • 4kb • 8kb
• Pacbio 2kb • Pacbio 3kb • Pacbio >10kb
• Low input 10kb • Pacbio 20kb • Oxford Nanopore
A Growing Portfolio of Library Capabilities
Example: DNA Products
Genome Fragments
Genome Fragments
Long Reads
Long Mate Pairs
Chew Yee Ngan 12
RNA Sample Prep: Gene expression • RNA-seq
• Small RNA-seq
13
RNA amount
PCR amplification
10ng 100ng
Poly A selection rRNA depletion (probes)
8 15
1ug
10
RNA-seq: Coverage Reproducibility
Pearson correlation = 0.9992
Strandedness
Eukaryotes: 20-40bp Prokaryotes: 50-150bp
Novel miRNA
Chew Yee Ngan
Epigenomic: DNA modifications
14
5 Methyl C Bisulfite sequencing
Methyl A Pacbio sequencing
Unmodified DNA template
Time
Modified DNA template
Delay in base incorporation
opposite methyl adenine.
Time
Stephen Mondo’s Presentation
Epigenomic: Chromatin structure
15
Protein binding sites Histone modifications Regulatory elements/ open chromatin:
FAIRE-seq
ChIP-seq
Genome Res. Jun 2007; 17(6): 877–885 Brief Bioinform (2010) doi: 10.1093/bib/bbq068
• 3’ RNA-seq
• ATAC-seq (Assay of Transposase Accessible Chromatin) – Identification of genome wide regulatory elements
New Technologies
16
• Reduced sample requirement • Organism dependent
– requires customization
• Gene expression study • Reduced sequencing cost Total RNA
AAAAA 0.00
0.02
0.04
0 20 40 60 80 100
Coverage across transcripts
ATAC ChIP
1214 | VOL.10 NO.12 | DECEMBER 2013 | NATURE METHODS
ARTICLES
ratio similar to that of DNase-seq, whose data were generated from ~3–5 orders of magnitude more cells13. Peak intensities were highly reproducible between technical replicates (R = 0.98) and highly correlated between ATAC-seq and two different sources of DNase-seq data (R = 0.79 and R = 0.83; Supplementary Fig. 1), and the majority of reads within peaks came from intersections of DNase and ATAC-seq peaks (Supplementary Fig. 2). We compared our data to DNase-hypersensitive sites identified in ENCODE DNase-seq data: receiver operating characteristic curves demonstrated a similar sensitivity and specificity for ATAC-seq and DNase-seq (Supplementary Fig. 3). ATAC-seq peak intensi-ties also correlated well with markers of active chromatin and not with transposase sequence preference (Supplementary Figs. 4 and 5). Highly sensitive open-chromatin detection was main-tained even when using 5,000 or 500 human nuclei as starting material (Supplementary Figs. 3 and 6), although sensitivity was diminished for smaller numbers of input material.
ATAC-seq insert sizes disclose nucleosome positionsWe found that ATAC-seq paired-end reads produced detailed information about nucleosome packing and positioning. The insert size distribution of sequenced fragments from human chro-matin had clear periodicity of approximately 200 bp, suggesting many fragments are protected by integer multiples of nucleo-somes (Fig. 2a). This fragment size distribution also showed clear periodicity equal to the helical pitch of DNA11. By parti-tioning the insert size distribution according to functional classes of chromatin as defined by previous models17 and normalizing to the global insert distribution (Online Methods), we observed clear class-specific enrichments across this insert size distribution
(Fig. 2b). This result demonstrated that these functional states of chromatin have an accessibility fingerprint that can be read out with ATAC-seq. The differen-tial fragmentation patterns were consist-ent with the putative functional state of these classes: CCCTC-binding factor (CTCF)-bound regions were enriched for short fragments of DNA, whereas transcription start sites (TSSs) were dif-ferentially depleted for mono-, di- and
trinucleosome–associated fragments. Transcribed and promoter-flanking regions were enriched for longer multinucleosomal frag-ments, a result suggesting that such regions represent a form of chromatin that is less accessible than that of TSSs. Finally, prior studies have shown that certain DNA sequences are refractory to
Amplifyand
sequence
Closed chromatin
Open chromatin
Tn5 transposome
a b
c Chr19 (q13.12) 19p13.3 13.1219p13.2 19p13.11 19p12 19q12 q13.11 3.12q 19q13.2 q13.32 13.41 q13.42 q13.43q13.33
ATAC-seq(50,000 cellsper replicate)
DNase HS ENCODE/Duke
FAIRE-seq ENCODE/UNC
H3K27acH3K4me3
H3K4me1CTCF
ScaleChr19: 50 kb hg19
36,150,000 36,200,000 36,250,000
HAUS5RBM42
ETV2
COX6B1
UPK1AUPK1A-AS1
ZBTB32MLL4
IGFLR1
U2AF1L4U2AF1L4
PSENENLIN37
HSPB6C19orf55
ARHGAP33
ARHGAP33
1
01
01
0
Day 1 2 3
Starting material Preparation time
4
FAIRE-seq
DNase-seq
ATAC-seq
107105104 108106103No. of cells 102
1
0
ATAC-seq(500 cells)
1
Figure 1 | ATAC-seq probes open-chromatin state. (a) ATAC-seq reaction schematic. Transposase (green), loaded with sequencing adaptors (red and blue), inserts only in regions of open chromatin (between nucleosomes in gray) and generates sequencing-library fragments that can be PCR-amplified. (b) Approximate reported input material and sample preparation time requirements for genome-wide methods of open-chromatin analysis. (c) Comparison of ATAC-seq to other open-chromatin assays at a locus in GM12878 lymphoblastoid cells. The lower ATAC-seq track was generated from 500 FACS-sorted cells. Bottom, composite locations of CTCF and histone modifications associated with active enhancers and promoters. DNase-seq (Duke) and FAIRE-seq (University of North Carolina, UNC) data were from the indicated ENCODE production centers.
a
bCTCF
TSS
Weak enhancer
Promoter flanking
Transcribed
Repressed
50 750 –0.1
5 0
0.15
Fragment length (bp)
Chromatin state:
Enhancer
400225 575
Enrichment
Trimer
DNA pitch(~10.5 bp)
Nucleosome
Single nucleosomeDimer
TetramerPentamer
Hexamer
1,200800400Fragment length (bp)
6
4
2
01,0008006004002000
Fragment length (bp)
810–3
10–5
10–4
0
Nor
mal
ized
rea
d de
nsity
× 1
0–3
Nor
m. r
ead
dens
ity
Figure 2 | ATAC-seq provides genome-wide information on chromatin compaction. (a) ATAC-seq fragment sizes generated from GM12878 nuclei. Inset, log-transformed histogram shows clear periodicity persists to six nucleosomes. (b) Normalized read enrichments for seven classes of chromatin state previously defined17.
5’ transcript 3’
Nature Methods 10, 1213–1218 (2013)
Major User Request: Decrease Sample Amount Requirements
Input; cost Throughput; consistency
The need for cheaper nano-scale sample prep
Microfluidic platform
Fluidigm C1™
Illumina Neoprep
Closed Commercial Microfluidic Platforms
2014-2016 ETOP Blainey Lab MIT
GenCell CLiC
Echo labcyte
17
• Oxford Nanopore Technology – find out more from Juna Lee next!
Emerging Sequencing Technologies
18