INTRODUCTION TO
NEXT GENERATION SEQUENCING
Claude Thermes
Analyse du génome Centre de Génétique Moléculaire
Gif-sur-Yvette 18/11/2013
ECOLE DE BIOINFORMATIQUE
INITIATION AU TRAITEMENT DES DONNÉES DE GÉNOMIQUE OBTENUES PAR SÉQUENÇAGE À HAUT DÉBIT
14-18 JANVIER 2013 - STATION BIOLOGIQUE - ROSCOFF
Step 1: sample preparation
Step 2: sequencing (Illumina)
Step 3: data analysis
(with permission of ABIMS)
Step 1: sample preparation
situation en 2009
Step 1: sample preparation
0.1 µg avec purif. Ribozero 1ng avec proto. Totalscript
1-2 ng
1 µg total RNA
50 ng avec proto. Nextera
Paired end Moleculo/Lrseq Mate pair Rad-seq Clip-seq Net-seq ....
situation en 2013
DNA-‐Seq Library
Genomic DNA
liga/on PCR
PCR product
Fragmented DNA
Cleavage (sonica.on)
?
Adaptor ligation
Paired end sequencing
1rst read
2d read
Single read density
? ? ?
Paired end density
Genome or transcript assembly
Comparison of single read versus paired end sequencing
• improves genome assembly
• be@er iden/fica/on of RNA 5’ and 3’ ends • but requires a good control of DNA fragmenta/on (purifying gels/columns)
• /me consuming and requires large quan//es (1-‐5 µg)
Paired end density
Single read density
? ? ?
Paired end density
Paired end sequencing :
Nextera “tagmentation” : a new methodology for construction of paired end libraries
Tagmentation
Dual barcode approach
up to 96 indexed samples
Tagment Enzyme fragments DNA and attaches junction adapters (blue and green) to both ends of the tagmented molecule
rapid ( 2 hours) and requires small quan//es (50 ng)
Transposomes / Tagment Enzyme
A recent improvement of mate pair libraries :
Illumina “Moleculo/LRSeq” technology
Genomic DNA is:
- sheared into 6–8 kb fragments
- partitioned into several 96-well plates
- further fragmented to 600–800 bp
- barcoded and sequenced separately
limiting the number of DNA molecules per well allows to study INDIVIDUAL FRAGMENTED MOLECULES
almost eliminates chances of having a repeated or duplicate sequence within a defined partition
since each well is over-sequenced, the error rate is reduced by the coverage
Voskoboynik et al. eLife Sciences 2013
• assembly of complex, repeat-rich genomes
• identification of alternative transcripts
Paired end fragments are too short
in particular for assembling large genomes with many repeated elements
mate pair libraries
“Classical” Illumina mate pair library
Problems : low coverage few fragments, over-‐amplified
several kilobases
Nextera Mate Pair : a new methodology for construction of mate pair fragments
Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule
circularization
Fragmentation enrichment via the biotin tag
adapters ligation at both ends
Nextera Mate Pair : a new methodology for construction of mate pair fragments
Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule
circularization
Fragmentation enrichment via the biotin tag
adapters ligation at both ends rapid ( few hours) and requires small quan//es (50 ng)
Rad-seq: Restriction site Associated DNA sequencing
Genome sub-sampling that allows to simultaneously discover and score large numbers of SNP markers in several (hundreds) individuals for minimal investment
widely applied to genetic mapping in a variety of organisms
Baird et al. (2008) PLoS ONE
Amplification primer
Amplification primer
prevents amplification of genomic fragments lacking a P1 adapter
prevents amplification of genomic fragments lacking a P1 adapter
Amplification primer
AGAACAA!TCTTGTT!
No Amplification primer
Amplification primer
AGAACAA!TCTTGTT!
prevents amplification of genomic fragments lacking a P1 adapter
No Amplification primer
CLIP-Seq : cross-linking immunoprecipitation sequencing
• Sequencing RNA sequences that interact with a particular RNA-binding protein :
• UV-crosslinking between RNA and the protein
• immunoprecipitation with antibodies for the protein
• fragmentation,
• sequencing
Sanford et al. Genome Research (2009)
• sequencing of 5’ ends of nascent RNAs still associated with the elongating polymerase complexes • detects the distribution of transcribing polymerases along the genome in a strand specific manner
NET-seq : Native Elongating Transcript sequencing
Churchman and Weissman, 2011
Pol II Pol II
Pol II Pol II
Pol II
Cells in desired condition
RNA polymerase II immunoprecipitation
Recovery of nascent transcripts Associated with the polymerase
RNA-seq and mapping on the genome
Some problems encountered when preparing libraries
DNA-‐Seq Library
read coverage correlates with GC content
GC content % read coverage
posi/on (bp)
GC content read coverage
Are these fluctuations reproducible between replicates ?
Multiplexed replicates to avoid differences due to sequencing
Mul/plexing
Different DNA samples
Liga.on
PCR amplifica.on(12-‐18 cycles)
Fragmented DNA samples
Cleavage
Tagged adaptors
Calibra.on
Sequencing of the mixed libraries in the same line
Mul/plexing before PCR
Different DNA samples
Liga.on
PCR amplifica.on(12-‐18 cycles)
Fragmented DNA samples
Cleavage
Tagged adaptors
Calibra.on
Sequencing of the mixed libraries in the same line
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample8
posi/on (bp)
read coverage (normalized)
Mul/plex liga/on before PCR
Libraries prepared from very small amounts
of DNA or RNA (<< 1ng)
• ChIP-‐seq with very small amounts of immuno-‐precipitated material
• RNA from small amounts of /ssue (laser dissec/on
Typical problem : accumula/on of dimers of the two adaptors
• adaptor dimers are amplified more rapidly than other fragments and “invade” the libraries
• they cons/tute the majority of sequenced reads
• rare fragments then tend to be non homogenously amplified
Sequencing of very small amounts of genome fragments (<< 1ng) 13 kb
43 kb
Small in put DNA
Increasing input DNA
Comparison of two RNA-‐seq library protocols:
SOLiDTM Whole Transcriptome Analysis Kit (RNase III fragmenta.on)
versus
Illumina’s direc/onal mRNA-‐Seq Library (Zinc fragmenta.on)
N NNNNNN 5’ 3’
HybridizaEon with adapters, ligaEon
Reverse transcripEon
PCR amplificaEon
Size selecEon
RiboMinus RNA
fragmented RNA
RNaseIII
SOLiDTM Whole Transcriptome Analysis Kit: RNase III fragmenta.on
Sequencing on SOLiD
YBR078W intron
SOLiD
YBR078W intron
SOLiD
Illumina
Sequencing on Illumina
Very heterogeneous pa@ern; not due to sequencing technology but to library prepara/on:
RNase III fragmenta/on not so random?
liga/on RT PCR ds PCR product
ribo-‐ RNA
fragmented RNA
Zinc
Total RNA
Deple.on of ribosomal RNA
Illumina direcEonal mRNA-‐Seq Library: Zinc fragmenta.on
RNase III
Zinc
YBR078W intron
Illumina direcEonal mRNA-‐Seq Library: Zinc fragmenta.on
Same number of reads
Rnase III
Zinc fragmentation
Correlation between
nucleotides
Distance between nucleotides
M. Wery, M. Descrimes, C. Thermes, D. Gautheret & A. Morillon (submitted)
Supports: CNRS, ACI IMPBio, ANR