mcb3895-004 lecture #9 sept 23/14 illumina library preparation, de novo genome assembly
TRANSCRIPT
![Page 1: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/1.jpg)
MCB3895-004 Lecture #9Sept 23/14
Illumina library preparation, de novo genome assembly
![Page 2: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/2.jpg)
Illumina sequencing
• https://www.youtube.com/watch?v=womKfikWlxM
http://openwetware.org/images/7/76/BMC_IlluminaFlowcell.png
![Page 3: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/3.jpg)
Illumina sequencing - summary
1. Template consists of DNA fragments amplified by bridge clustering
2. "Sequencing by synthesis" used to generate DNA sequences
3. DNA sequence read as unique fluorescent signatures following base incorporation
![Page 4: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/4.jpg)
Illumina sequencing - summary
4. Adapters at each end of the template molecule bind the flowcell adaptors and facilitate bridge amplification
5. "Dual indexing" allows multiple samples to be sequenced on the same flowcell, each having a unique set of indices
6. Paired-end sequencing extends the regular sequencing protocol to read each template molecule in both directions
![Page 5: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/5.jpg)
Paired-end sequencing
• Objective: allows repetitive regions to be sequenced more precisely
http://technology.illumina.com/technology/next-generation-sequencing/paired-end-sequencing_assay.html
![Page 6: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/6.jpg)
Paired-end sequencing
• Be careful to distinguish terms!
• Do not confuse adapters with the read or template fragment
http://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html
![Page 7: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/7.jpg)
Paired-end sequencing
• "Insert" is even more confusing
• Refers to entire fragment, including both the reads and the unsequenced "inner mate" region between them
• Term stems from long-dead plasmid sequencing approaches
http://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html
![Page 8: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/8.jpg)
Paired-end sequencing
• It is possible to have paired end reads that overlap each other
• Can assemble to create long, highly accurate contiguous reads
http://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html
![Page 9: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/9.jpg)
Paired-end sequencing
• If the template fragment is too short, it is possible to read past the end of the fragment
• Results in adapter region being included in read
• Needs to be removed computationally.
http://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html
![Page 10: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/10.jpg)
Library preparation
• How exactly are template fragments generated?
• Lots of methods, I only present two: TruSeq and Nextera
• Most common Illumina methods (specific kits available from Illumina)
• Think about: where might biases arise?
![Page 11: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/11.jpg)
TruSeq library preparation
• Step #1: Fragment DNA• Typically via shearing• Produces uniformly sized fragments
http://res.illumina.com/documents/products%5Cdatasheets%5Cdatasheet_truseq_dna_pcr_free_sample_prep.pdf
![Page 12: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/12.jpg)
TruSeq library preparation
• Step #2: Create blunt ends using a polymerase to remove 3' overhangs and fill in 5' overhangs
• Use bead purification to remove smallest fragments, blunt ending reagents
http://res.illumina.com/documents/products%5Cdatasheets%5Cdatasheet_truseq_dna_pcr_free_sample_prep.pdf
![Page 13: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/13.jpg)
TruSeq library preparation
• Step #3: Adenylate 3' ends to prevent self-ligation while adding adapters
http://res.illumina.com/documents/products%5Cdatasheets%5Cdatasheet_truseq_dna_pcr_free_sample_prep.pdf
![Page 14: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/14.jpg)
TruSeq library preparation
• Step #4: Ligate adapters containing sequencing primer, indices, flowcell capture site
http://res.illumina.com/documents/products%5Cdatasheets%5Cdatasheet_truseq_dna_pcr_free_sample_prep.pdf
![Page 15: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/15.jpg)
Nextera library preparation
• Nextera uses engineered transposases to fragment genomic DNA and add sequencing adaptors at the same time
• Low DNA input requirement
• "Transposome" = transposon + DNA for attachment
http://support.illumina.com/content/dam/illumina-support/documents/myillumina/2a3297c5-8a34-4fc5-a148-3e16666fd65e/nextera_dna_sample_prep_guide_15027987_b.pdf
![Page 16: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/16.jpg)
Nextera library preparation
• Step #1: Use "tagmentation" to simultaineously fragment template DNA and add sequencing adapters
• 300bp insert size reflects minimum needed by transposases to cut and add adapters
http://support.illumina.com/content/dam/illumina-support/documents/myillumina/2a3297c5-8a34-4fc5-a148-3e16666fd65e/nextera_dna_sample_prep_guide_15027987_b.pdf
![Page 17: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/17.jpg)
Nextera library preparation
• Step 2: Purify fragments from transposome (part of Nextera kit)
• Result: fragment contains both 5' and 3' sequencing adapters
http://support.illumina.com/content/dam/illumina-support/documents/myillumina/2a3297c5-8a34-4fc5-a148-3e16666fd65e/nextera_dna_sample_prep_guide_15027987_b.pdf
![Page 18: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/18.jpg)
Nextera library preparation
• Step #3: Use PCR to add indices and flowcell capture sites to the fragment
• Non-template fragments excluded during bead clean-up following this step
http://support.illumina.com/content/dam/illumina-support/documents/myillumina/2a3297c5-8a34-4fc5-a148-3e16666fd65e/nextera_dna_sample_prep_guide_15027987_b.pdf
![Page 19: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/19.jpg)
Nextera library preparation
• Final result:• Template fragment• Sequencing adapters• Dual indices• Flowcell capture sites• (same structure as TruSeq)
http://support.illumina.com/content/dam/illumina-support/documents/myillumina/2a3297c5-8a34-4fc5-a148-3e16666fd65e/nextera_dna_sample_prep_guide_15027987_b.pdf
![Page 20: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/20.jpg)
Library prep is not error-free
http://res.illumina.com/documents/products/technotes/technote_truseq_comparison.pdf
![Page 21: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/21.jpg)
Library prep is not error-free
http://res.illumina.com/documents/products/technotes/technote_truseq_comparison.pdf
![Page 22: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/22.jpg)
Library prep is not error-free• Regions with lower
coverage are GC-rich
• No method is perfect
• Also note: Nextera uses low cycle PCR, has potential for bias
http://res.illumina.com/documents/products/technotes/technote_truseq_comparison.pdf
![Page 23: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/23.jpg)
Mate pairs
• Paired end sequencing actually binds each fragment to the flowcell and sequences from each end
• Size limitations: large fragments are too floppy to sequence well
• Mate pairs: maintain same philosophy of adding inserts of known sizes, but facilitating larger insert sizes
![Page 24: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/24.jpg)
Nextera mate pair library preparation• Step #1: Use Nextera tagmentation to fragment
template and add adapters
• Adaptors are biotinylated for later steps
http://res.illumina.com/documents/products/datasheets/datasheet_nextera_mate_pair.pdf
![Page 25: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/25.jpg)
Nextera mate pair library preparation• Step #2: Fragment is circularized using a
"biotin junction adapter"
http://res.illumina.com/documents/products/datasheets/datasheet_nextera_mate_pair.pdf
![Page 26: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/26.jpg)
Nextera mate pair library preparation• Step #3: Circular molecules fragmented, biotin
tags used to enrich fragments having junction
• Recall: junction contains original fragment ends
http://res.illumina.com/documents/products/datasheets/datasheet_nextera_mate_pair.pdf
![Page 27: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/27.jpg)
Nextera mate pair library preparation• Step #4: Use TruSeq protocol to end repair, A-
tail, and ligate flowcell capture sequences and barcodes
• Final product has all the normal parts of an Illumina template library but also junction region mid-fragment
http://res.illumina.com/documents/products/datasheets/datasheet_nextera_mate_pair.pdf
![Page 28: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/28.jpg)
Questions?
![Page 29: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/29.jpg)
Digging deeper into the guts de novo genome assembly• Important to know to be able to tune assembly
software appropriately!
• Two paradigms:1. Overlap/layout/consensus2. De Bruijn graphs
• Both find overlaps between sequences, create a network representation, and find the best path through that network to represent the final assembly
![Page 30: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/30.jpg)
Overlap/layout/consensus genome assembly• Step #1: Compare all reads to each other to
find those that overlap
• Let's do it together! Reads (5'->3'):
TGGCA
CAATT
ATTTGAC
GCATTGCAA
TGCAAT
![Page 31: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/31.jpg)
Overlap/layout/consensus genome assembly• Step #2: Create overlap graph arranging reads
according to their overlaps
• Step #3: Find unique path through the graph
• Step #4: Assemble overlapping reads by aligning the reads and deriving consensus
![Page 32: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/32.jpg)
Overlap/layout/consensus genome assembly• Requires all-vs-all comparison of reads
• becomes computationally intensive as the number of reads increases
• Developed and applied for Sanger and 454 sequencing
• Not dead yet! Has reemerged for PacBio and other long-read techniques
![Page 33: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/33.jpg)
But consider errors
• Our network was for perfectly accurate reads
• What happens when you have both the correct TGGCA read and a TGCCA read containing a substitution sequencing error?
![Page 34: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/34.jpg)
De Bruijn graph assembly
• Instead of comparing all reads with each other, split reads up into kmers
• i.e., subsets of each read of a given length
• Much more computationally efficient than all-vs-all comparison in overlap/layout/consensus
![Page 35: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/35.jpg)
De Bruijn graph assembly
• Step #1: Tally kmers
• Let's find all kmers where k=4 for our set of reads from before
TGGCA
CAATT
ATTTGAC
GCATTGCAA
TGCAAT
![Page 36: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/36.jpg)
De Bruijn graph assembly
• Step #2: Create graph of kmer overlap, where kmers are nodes and overlap between them are edges
• More complex than overlap graph
• Step #3: Find unique path through the graph• Can leverage kmers adjacent to each other in reads
to reduce complexity
• Step #4: Synthesize path into a consensus sequence
![Page 37: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/37.jpg)
De Bruijn graph assembly
• Doesn’t need all-vs-all comparison so is much faster
• Can handle large numbers of reads, e.g., as generated by Illumina technology
• Graph is much more complicated, RAM intensive
• More sensitive to errors
![Page 38: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/38.jpg)
De Bruijn graph assembly
• Consider errors: make the graph even more complicated with bubbles, dead ends
• Consider repeats: parts of the graph with no unique path through it
• Graph broken on each side, forming contigs
![Page 39: MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly](https://reader030.vdocuments.site/reader030/viewer/2022012913/56649cfd5503460f949cde95/html5/thumbnails/39.jpg)
Next class
• Quality control of Illumina data
• Adapter trimming
• Error correction
• Next week: de novo genome assembly