sequencing technologies · hibridization against designed probes from several targeted loci to over...
TRANSCRIPT
![Page 1: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/1.jpg)
Jose BlancaCOMAV institute
bioinf.comav.upv.es
Sequencing technologies
![Page 2: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/2.jpg)
![Page 3: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/3.jpg)
Outline
Sequencing technologies:● Sanger
● 2nd generation sequencing:– 454– Illumina– SOLiD– Ion Torrent
● 3er generation sequencing:– PacBio– Nanopore
● General considerations
Reducing the complexity
![Page 4: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/4.jpg)
![Page 5: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/5.jpg)
3000M$Public US project
1000$
genome.gov/sequencingcosts
![Page 6: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/6.jpg)
Sanger sequencing
![Page 7: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/7.jpg)
Sanger sequencing
Traditional DNA sequencing method
Ideal for small sequencing projects
Read length around 600-800 bp
Around 5-10$ per reaction
384 reactions in parallel at most
Applied Biosystems is the main technological provider
![Page 8: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/8.jpg)
Sanger sequencing
![Page 9: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/9.jpg)
Sanger sequencing
![Page 10: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/10.jpg)
Sequence and quality
Due to technical limitations different technologies have different errors patterns.
![Page 11: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/11.jpg)
Sequence and quality
Phred score = - 10 log (prob error)
![Page 12: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/12.jpg)
Sanger sequencing
In Sanger quality is worst at the beginning and at the end.
![Page 13: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/13.jpg)
Any evidence has error bars
Any conclusion has error bars
![Page 14: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/14.jpg)
Other sources of error
Pre-sequencing:● PCR mutation-like errors● Polymerase slippage (low complexity regions)● PCR primers (e.g. hexamers in random priming)● Cloning artifacts, chimeric molecules● Sample contamination● Index/flag assignment errors
Post-sequencing:● Assembly artifacts
● Alignment errors due to:– Reference– Alignment algorithms
● SNV calling software
![Page 15: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/15.jpg)
2nd generation sequencing
![Page 16: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/16.jpg)
Sanger vs NGS sequencing
Sanger NGS
Num. sequences per reaction
1 clone Millions of molecules
Max. parallelization 384 Several millions
Sequence quality High Low
Sequence length 600-800 bp35-20000 (depends on the
platform)
Throughtput Low High
![Page 17: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/17.jpg)
Sanger vs NGS
![Page 18: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/18.jpg)
Library preparation
Fragmentation● Sonication● Nebulization● Shearing
Size selection
End repair
Sequencing adaptor ligation
Purification
![Page 19: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/19.jpg)
454
First NGS platform (and first to be phased out)
Pirosequencing based chemistry
Long reads (400-700bp)
Owned by Roche
>1 million reads
Obsolete
![Page 20: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/20.jpg)
454
![Page 21: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/21.jpg)
454 quality
The lengthiest the homopolymer the less quality.
It is very difficult to differentiate AAAAAA from AAAAA.
Quality diminishes with the sequence length.
![Page 22: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/22.jpg)
Illumina
Previously known as Solexa
Reversible terminators based sequencing technique
Short reads (50 or 250bp depending on the version)
Lowest cost per base
Ideal for resequencing projects
Highest throughput
Runs divided in 8 lanes
Up to 4000 million reads
Can sequence both ends of the molecules (paired ends)
![Page 23: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/23.jpg)
Illumina instruments
![Page 24: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/24.jpg)
Illumina
![Page 25: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/25.jpg)
Illumina
Quality diminishes with sequence length.
No homopolymer problem, mainly substitution errors.
![Page 26: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/26.jpg)
SOLiD
Ligation based sequencing chemistry
Short reads (35 - 75bp depending on the version)
Only for resequencing projects
It used to produce color sequences, not nucleotides
Color sequences have poor quality, but nucleotide sequences have high quality
115 or 320 million reads
![Page 27: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/27.jpg)
SOLiD
![Page 28: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/28.jpg)
SOLiD
Really?
![Page 29: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/29.jpg)
Ion Torrent
Around 60-80 M reads.
200 pb length.
Sequences based on H+ production
Error rates higher than other 2nd generation
Error pattern similar to 454, with homopolymer problem.
Belongs to Life technologies (Applied Biosystems)
![Page 30: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/30.jpg)
Ion Torrent
![Page 31: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/31.jpg)
3rd generation sequencing
![Page 32: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/32.jpg)
PacBio
3rd generation platform (single molecule)
Polymerase based chemistry (SMRT)
Longest NGS reads (up to 20,000bp)
Very high error rate
Ideal for de novo sequencing projects
Not many reads
![Page 33: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/33.jpg)
PacBio
3rd generation, single molecule detection. No amplification step required.
Nucleotides labeled on the phosphate removed during the polymerization.
Sequencing based on the time required by the polymerase to incorporate a nucleotide (Polymerase requires milliseconds versus microseconds for the stochastic diffusion)
![Page 34: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/34.jpg)
PacBio length distribution
Taken from flxlexblog.wordpress.com
![Page 35: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/35.jpg)
PacBio quality distribution
Taken from flxlexblog.wordpress.com
Distributions for standard sequencing.
For circular sequencing the qualityis at least 99.9%
![Page 36: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/36.jpg)
Nanopore
USB powered
Senses differences in ion flow
![Page 37: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/37.jpg)
Nanopore-first data
Not very reliable (yet)
![Page 38: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/38.jpg)
Nanopore alignments
![Page 39: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/39.jpg)
Nanopore alignments
![Page 40: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/40.jpg)
Nanopore accuracy
![Page 41: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/41.jpg)
Nanopore-Illumina hybrid error correction
![Page 42: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/42.jpg)
Nanopore RNA sequencing
Full length mRNAs
Full length RNA viruses
![Page 43: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/43.jpg)
![Page 44: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/44.jpg)
Bioinformatic challenges
Huge data files handling.
Beefy computers required.
Software still being developed or missing.
Ad-hoc software required during the analysis.
Existing software tailored to experienced bioinformaticians.
Dollar for dollar rule proposedEDSAC by Computer Laboratory Cambridge
![Page 45: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/45.jpg)
Bioinformatic challengesAmount of data managed on a small transcriptome assembly
![Page 46: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/46.jpg)
Genome.org/sequencingcosts
![Page 47: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/47.jpg)
Reducing the complexity
![Page 48: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/48.jpg)
Genome
Pros:● Finest resolution
● Reproducible
Cons:● Expensive ($600 per sample)
● Lots of information will be lost if no reference is available, especially in the repetitive regions.
![Page 49: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/49.jpg)
RNASeq
Pros:● Cheaper than whole genome sequencing ($300)
● Well proven methodologies
● Reproducible
● Follows gene density
Cons:● RNA handling
● For many samples is pricier than GBS
● Follows gene density
![Page 50: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/50.jpg)
Exome
![Page 51: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/51.jpg)
Exome
![Page 52: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/52.jpg)
Exome
Pros:● More complete representation than RNASeq
● More reproducible than RNASeq
Cons:● Exome capture platforms only available in
model species
● Pricier than RNASeq
![Page 53: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/53.jpg)
Sequence capture
Targeted sequence capture as a powerful tool for evolutionary analysis
Am. J. Bot doi: 10.3732/ajb.1100323
Hibridization against designed probes
From several targeted loci to over a million target regions
It is expensive to design and create the probe set
Costs per sample will depend on the number of probes
![Page 54: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/54.jpg)
Genotyping by Sequencing (GBS)do
i: 10
.153
4/g
enet
ics.
112.
147
710
GBS review: Nature Reviews Genetics 12, 499-510 (July 2011) | doi:10.1038/nrg3012
![Page 55: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/55.jpg)
Genotyping by Sequencing (GBS)
Pros:● Cheap ($50 per sample)
● Lots of variation
Cons:● Prone to artifacts (e.g. false SNPs due to
repetitive DNA) if no reference genome is available.
● Degree of coverage along the genome depends on the Restriction Enzyme chosen
● How reproducible is it?
● Patent trolls
GBS based SNPs in Soydoi: 10.1534/genetics.112.147710
![Page 56: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/56.jpg)
Amplicons
Pros:
● Cheap for few genes
● Amplicon sets can be ordered, but the design is expensive
Cons:
● Not scalable for lots of genes
● Previous sequence information is required
![Page 57: Sequencing technologies · Hibridization against designed probes From several targeted loci to over a million target regions It is expensive to design and create the probe set Costs](https://reader034.vdocuments.site/reader034/viewer/2022042102/5e7f6b6cdd7bca4407189ea1/html5/thumbnails/57.jpg)
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Jose BlancaCOMAV institute
bioinf.comav.upv.es