learning to love de bruijn graphs
TRANSCRIPT
![Page 1: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/1.jpg)
Learning to love de Bruijn graphsBen Woodcroft,
Australian Centre for Ecogenomics (ACE)
Winter School in Bioinformatics, 2015
![Page 2: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/2.jpg)
A slide from Torsten Seemann
![Page 3: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/3.jpg)
K-mers and assembly
• For next-generation sequencing, comparison of each read with each other read is impossible.– E.g. 10 million reads -> 107 x 107 read-read
comparisons. Slowww..
• K-mers and de Bruijn graphs help make things tractable
![Page 4: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/4.jpg)
K-mers and assembly
![Page 5: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/5.jpg)
Forks
![Page 6: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/6.jpg)
K-mer too small
![Page 7: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/7.jpg)
K-mer too large
![Page 8: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/8.jpg)
My favourite k-mer size
![Page 9: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/9.jpg)
My favourite k-mer size
With a 100bp read, this can never happen with a k-mer size of 51
![Page 10: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/10.jpg)
Less tips, more bubbles
As read lengths get longer, assemblers must move from handling dead ends in the graph to handling bubbles.
![Page 11: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/11.jpg)
Tips and bubbles
![Page 12: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/12.jpg)
Metagenome assembly
Me: “I know, why don’t I just assemble all my data together?”
Run assemblyWait 4 daysOut of memory allocating 18.4 million terabytes of RAM.
![Page 13: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/13.jpg)
Solutions to RAM issues
• Quality trimming• Hard trimming• Throwing away a proportion of reads
randomly• Sequencing something else
![Page 14: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/14.jpg)
Lossy de Bruijn graphs
The number of k-mers observed is vanishingly small relative to the total number of possible k-mers
The human genome: ~3Gbp = ~3×109 k-mersTotal possible 51-mers: 451 = ~1030
0.00000000000000000002%
When making a list of k-mers, counting extra ones probably has little effect on assembly.
![Page 15: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/15.jpg)
Bloom filters
A low memory k-mer “store”
![Page 16: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/16.jpg)
Is my k-mer in these reads?
From a bloom filter, the answer is either “no” or “probably”
![Page 17: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/17.jpg)
A finishing approach to assembly
A central assumption of this method is that the genome is “mostly” complete
![Page 18: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/18.jpg)
Scaffolding without mate pair data
![Page 19: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/19.jpg)
Gap filling vs. assembly
• Regular assembly ain’t easy• Re-assembly is more straightforward because
you are trying to get to somewhere
![Page 20: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/20.jpg)
Gap filling can correct assembly errors
• Contigs often contain errors right at the ends of contigs
• By starting to search a bit back (e.g. 200bp) away from the end of the contig, these errors can be overcome
![Page 21: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/21.jpg)
Gap-filling can account for strain variation
github.com/wwood/finishm
![Page 22: Learning to Love De Bruijn Graphs](https://reader036.vdocuments.site/reader036/viewer/2022062522/587d07b41a28ab1e7e8b7a0f/html5/thumbnails/22.jpg)
Thanks!
• Slideshare.com/benjwoodcroft
• Github.com/wwood
• Ecogenomic.org