reconstructing paleoenvironments using metagenomics
DESCRIPTION
Lecture on metagenomics research at the Naturalis Biodiversity Center, based on the use case of sequencing the metagenome of mammoth stomach contents.TRANSCRIPT
Reconstructing paleoenvironments using metagenomics
Rutger Vos
Outline
• About Naturalis Biodiversity Center
• NBC's facilities and expertises
Ancient DNA lab
Barcode lab
Informatics focus group
• Metagenomics and paleoecology
• Use case: The mammoth's last meal
• NGS@Naturalis
• Pipeline development
6 February 2013Metagenomics approaches and data analysis
Naturalis Biodiversity Center
• With 37 million specimens, NBC holds one of the largest natural history collections in the world
• More than just a museum, NBC is an expert center specializing in:
Species identification
Trait harvesting
Impact modeling
Ecological intensivation
6 February 2013Metagenomics approaches and data analysis
Ancient DNA lab
• The ancient-DNA facility is equipped for recovering DNA from plant and animal material from museum collections and fossils.
• It permits research that would otherwise not be possible, such as the study of ancient populations and museum material.
• The ancient-DNA lab provides an environment where the risk of contamination with contemporary DNA is minimal.
• The facility, a collaboration of IBL, the faculty of archeology and NBC, is unique in the Netherlands
6 February 2013Metagenomics approaches and data analysis
Barcode lab
6 February 2013Metagenomics approaches and data analysis
Informatics focus group
• Exploitation of HPC resources
• Dissemination of best practices
• In-house development of research-supporting tools:
NGS data processing
Clustering, BLASTing
Custom pipelines
Visualization
Image analysis
Niche modeling
6 February 2013Metagenomics approaches and data analysis
HPC infrastructure
• Dell T7500 and T7600 workstations
• Intel® Xeon® Processor (QuadCore, 2.40GHz) x 2
• 128Gb RAM
• TESLA/NVIDIA GPU
• RedHat/Ubuntu Linux
• Always looking for extra numbercrunching power, e.g. from NBIC Galaxy, CIPRES, BioPortal, etc.
6 February 2013Metagenomics approaches and data analysis
Paleoenvironments
• Reconstructing the paleoenvironment is useful for:
Understanding the dynamics of ecosystem change
Reconstructing pre-industrialization ecosystems
• Many public policy decision-makers have pointed to the importance of using palaeoecological studies as a basis for choices made in conservation ecology
6 February 2013Metagenomics approaches and data analysis
Metagenomics
• Taxonomic identification is one of the main challenges surrounding metagenomics, and one of NBC’s core strengths
• Conversely, a better understanding of the metagenome feeds back into our other research interests and expertises
• Consequently, a lot of research activity and ongoing capacity building
6 February 2013Metagenomics approaches and data analysis
Use case: the woolly mammoth's dietary metagenome
6 February 2013Metagenomics approaches and data analysis
The research programme• To test hypotheses about the structure of the ancient environment
of the woolly mammoth, i.e. productive, continuous grassland steppe or sparsely covered herb tundra
• Finding frozen mammoths with forensically identiable food, parasites, and microorganisms in their gastrointestinal tracts or feces has the potential of adding data to the extinction debate
• To integrate the findings from ancient DNA with those obtained from macro- and micro-fossils
6 February 2013Metagenomics approaches and data analysis
Lyuba
Cape BlossomYukagir
Permafrost-preserved mammoth remains
"Lyuba"
6 February 2013Metagenomics approaches and data analysis
• Discovered in May 2007
• One-month old mammoth calf
• Age: 41,910 ± 550 YBP
• Very well-nourished, milk-fed
The Yukagir mammoth
6 February 2013Metagenomics approaches and data analysis
• Male woolly mammal
• Discovered in 2002
• Very well preserved in the permafrost
• Age: 18,560 ± 50 YBP
• Head, front legs, parts of stomach and intestinal tract
• Last meal still preserved
The Cape Blossom mammoth dung
6 February 2013Metagenomics approaches and data analysis
• Produced during the cold season
• Found among a partial skeleton
• Exact site unknown
• Age: 12,300 YBP
DNA extraction and sequencing
• In all studies, macro-fossils (stems, leaves, seeds), micro-fossils (pollen) and ancient DNA were compared
• DNA was extracted in the ancient DNA facility using multiple extraction protocols
• Several commonly-used markers were amplified (trnL, rbcL, nrITS1)
• Sanger sequencing was done on an ABI 3730xl
6 February 2013Metagenomics approaches and data analysis
Data analysis
• Sequences were assembled using Sequencher
• Taxa were assigned using a combination of GenBank BLAST searches and phylogenetic inference
• BLAST hits were only accepted if they covered the full query sequence and differed by at most 1 nucleotide
• Phylogenetic placement was determined on the basis of bootstrap support (1000 replicates using paup*)
6 February 2013Metagenomics approaches and data analysis
Findings
• Ancient DNA could assign 7 ("Lyuba"), 12 ("dung") and 8 ("Yukagir") plant families, with several determinations down to genus level
• Molecules complemented and confirmed fossils
• Identified vegetation composition is generally supportive of a productive "mammoth steppe"
• Micro-fossils of specific dung fungi showed that mammoths appear, unlike elephants, to be habitually coprophagous
6 February 2013Metagenomics approaches and data analysis
Next generation applications
• The results of the mammoth research so far have been obtained using Sanger sequencing
• Similar, as yet unpublished, research is being undertaken with the newly acquired IonTorrent "sequencing by synthesis" platform
6 February 2013Metagenomics approaches and data analysis
Marcel Eurlings at Naturalis
IonTorrent chip generations
6 February 2013Metagenomics approaches and data analysis
IonTorrent data pre-processing workflow
6 February 2013Metagenomics approaches and data analysis
Filter out short reads
Splice out low phred scores
Split by primer
sequence
Split by adapter
sequence
FASTA for downstream analysis
Taxonomic identification pipeline
• Taxonomic identification of the contents of samples is a generic problem for which we have developed a re-usable pipeline
• It replicates some of the functionality of QIIME but integrates more conveniently in our HPC configuration
• Requirements:
Python 2.7 or 3.2
Biopython 1.58
NCBI-Blast-2.2.25+
Clustering programs, e.g. TGICL, Usearch, Octupus, cd-hit
6 February 2013Metagenomics approaches and data analysis
Pipeline steps
6 February 2013Metagenomics approaches and data analysis
Optional: tag FASTA for provenance retracing across files
Cluster sequences into OCTUs of at least 10 reads
Pick exemplar sequence (random, consensus or hybrid)
BLAST exemplar sequences (local or remote)
Optional: retrace provenance
Report
Pipeline extensions
• NBC frequently deals with samples that may contain materials from endangered species, for example:
Putative FSC wood
Traditional Chinese medicine
Incense
• We are therefore extending the taxonomic identification pipeline to check automatically whether any taxa from the sample are listed in CITES appendices
• This, however, poses additional challenges of taxonomic name reconciliation
6 February 2013Metagenomics approaches and data analysis
Other metagenomics work
• Phylogenies from metagenomic sequence data can grow to immense sizes
• For example, the GreenGenes 16S rRNA tree has ~400k tips
• We are developing novel algorithms for pruning these trees using (Google’s) MapReduce programming model
6 February 2013Metagenomics approaches and data analysis
Acknowledgements
• I am grateful to:
• Dr. Barbara Gravendeel for her input in developing this talk
• Youri Lammers for his great working in developing a well-documented taxonomic identification pipeline
• And to NBIC for giving me the opportunity to present this story
6 February 2013Metagenomics approaches and data analysis
Thank you for listening