assembly of metagenomes
DESCRIPTION
A talk for I gave for the 2011 metagenomics course at the Biological Dept. Univ. of Oslo April 2011TRANSCRIPT
Assembly of metagenomes
Lex NederbragtNorwegian Sequencing Center &
Centre for Ecological and Evolutionary SynthesisUniversity of Oslo
What is assembly
• From reads to genome
Why assembly?
Wooley JC et al, PLoS Comput Biol. 2010 Feb 26;6(2):e1000667
How
Find overlap between reads
How
Build consensus sequence
Challenges
Collapsed contig
Shotgun reads
DNA
Shotgunreads
Contigs
Repetitive element
Results
Lots of pieces
Mate pairs
Assembly with mate pairs
Paired reads
Gaps
ScaffoldContigs
Mate pairs
Scaffold NNNNN NNNNN
Contig Contig Contig
Mate pairs?
150– 600 bases
454/Illumina
Illumina
Mate pairs!
Longer jumps:
Mate pairs
• Little used for metagenomics...
Why is assembly hard for metagenomes?
• Heterogeneous samples– many different genomes– overlap between genomes• e.g. 16S
• Non-species-specific contigs
http://rna.ucsc.edu/
When could it work
• One or a few dominating species– contigs might be species-specific
Specialized software
• Genovo
Specialized software
• Genovo– Uses a 'generative probabilistic model' of read
generation – Assembler discovers 'likely sequence
reconstructions under the model'
Use your favorite assembler
• Newbler (454)• Velvet• Euler• SOAPdenovo• ...• Tweak parameters
e.g. higher stringency for determining overlaps
Check contigs for
• Read depth• GC frequency• Tetranucleotide frequency
Example
Read depth
Challenges
Collapsed contig
Shotgun reads
DNA
Shotgunreads
Contigs
Repetitive element
Results
Lots of pieces
Higher read depth
DNA
Repetitive element
Example
One contig
Log scale!
Example
Example
Bacteroides
Proteobacteria
Cyanobacteria
Caulobacteraceae
Solution
• Split contigs on– read depth– GC%
• Use BLAST
Metagenomic ORFome Assembly
Ye Y, Tang H. 2009. J Bioinform Comput Biol 7: 455-471
Gene/protein-directed assembly
Iterative read mapping and assembly
Align reads to a single reference genome
'Update' the reference based on alignment
Align remaining reads again
Dutilh BE, Huynen MA, Strous M. 2009. Bioinformatics 25: 2878-2881.
Reverse metagenomics
• Leptospirillum group III never cultured• shotgun metagenomics
nitrogen fixation geneGC content and read depth Leptospirillum group
III• Culturable for the first time