reconstructing paleoenvironments using metagenomics

27
Reconstructing paleoenvironments using metagenomics Rutger Vos

Upload: rutger-vos

Post on 20-Jun-2015

711 views

Category:

Education


1 download

DESCRIPTION

Lecture on metagenomics research at the Naturalis Biodiversity Center, based on the use case of sequencing the metagenome of mammoth stomach contents.

TRANSCRIPT

Page 1: Reconstructing paleoenvironments using metagenomics

Reconstructing paleoenvironments using metagenomics

Rutger Vos

Page 2: Reconstructing paleoenvironments using metagenomics

Outline

• About Naturalis Biodiversity Center

• NBC's facilities and expertises

Ancient DNA lab

Barcode lab

Informatics focus group

• Metagenomics and paleoecology

• Use case: The mammoth's last meal

• NGS@Naturalis

• Pipeline development

6 February 2013Metagenomics approaches and data analysis

Page 3: Reconstructing paleoenvironments using metagenomics

Naturalis Biodiversity Center

• With 37 million specimens, NBC holds one of the largest natural history collections in the world

• More than just a museum, NBC is an expert center specializing in:

Species identification

Trait harvesting

Impact modeling

Ecological intensivation

6 February 2013Metagenomics approaches and data analysis

Page 4: Reconstructing paleoenvironments using metagenomics

Ancient DNA lab

• The ancient-DNA facility is equipped for recovering DNA from plant and animal material from museum collections and fossils.

• It permits research that would otherwise not be possible, such as the study of ancient populations and museum material.

• The ancient-DNA lab provides an environment where the risk of contamination with contemporary DNA is minimal.

• The facility, a collaboration of IBL, the faculty of archeology and NBC, is unique in the Netherlands

6 February 2013Metagenomics approaches and data analysis

Page 5: Reconstructing paleoenvironments using metagenomics

Barcode lab

6 February 2013Metagenomics approaches and data analysis

Page 6: Reconstructing paleoenvironments using metagenomics

Informatics focus group

• Exploitation of HPC resources

• Dissemination of best practices

• In-house development of research-supporting tools:

NGS data processing

Clustering, BLASTing

Custom pipelines

Visualization

Image analysis

Niche modeling

6 February 2013Metagenomics approaches and data analysis

Page 7: Reconstructing paleoenvironments using metagenomics

HPC infrastructure

• Dell T7500 and T7600 workstations

• Intel® Xeon® Processor (QuadCore, 2.40GHz) x 2

• 128Gb RAM

• TESLA/NVIDIA GPU

• RedHat/Ubuntu Linux

• Always looking for extra numbercrunching power, e.g. from NBIC Galaxy, CIPRES, BioPortal, etc.

6 February 2013Metagenomics approaches and data analysis

Page 8: Reconstructing paleoenvironments using metagenomics

Paleoenvironments

• Reconstructing the paleoenvironment is useful for:

Understanding the dynamics of ecosystem change

Reconstructing pre-industrialization ecosystems

• Many public policy decision-makers have pointed to the importance of using palaeoecological studies as a basis for choices made in conservation ecology

6 February 2013Metagenomics approaches and data analysis

Page 9: Reconstructing paleoenvironments using metagenomics

Metagenomics

• Taxonomic identification is one of the main challenges surrounding metagenomics, and one of NBC’s core strengths

• Conversely, a better understanding of the metagenome feeds back into our other research interests and expertises

• Consequently, a lot of research activity and ongoing capacity building

6 February 2013Metagenomics approaches and data analysis

Page 10: Reconstructing paleoenvironments using metagenomics

Use case: the woolly mammoth's dietary metagenome

6 February 2013Metagenomics approaches and data analysis

Page 11: Reconstructing paleoenvironments using metagenomics

The research programme• To test hypotheses about the structure of the ancient environment

of the woolly mammoth, i.e. productive, continuous grassland steppe or sparsely covered herb tundra

• Finding frozen mammoths with forensically identiable food, parasites, and microorganisms in their gastrointestinal tracts or feces has the potential of adding data to the extinction debate

• To integrate the findings from ancient DNA with those obtained from macro- and micro-fossils

6 February 2013Metagenomics approaches and data analysis

Page 12: Reconstructing paleoenvironments using metagenomics

Lyuba

Cape BlossomYukagir

Permafrost-preserved mammoth remains

Page 13: Reconstructing paleoenvironments using metagenomics

"Lyuba"

6 February 2013Metagenomics approaches and data analysis

• Discovered in May 2007

• One-month old mammoth calf

• Age: 41,910 ± 550 YBP

• Very well-nourished, milk-fed

Page 14: Reconstructing paleoenvironments using metagenomics

The Yukagir mammoth

6 February 2013Metagenomics approaches and data analysis

• Male woolly mammal

• Discovered in 2002

• Very well preserved in the permafrost

• Age: 18,560 ± 50 YBP

• Head, front legs, parts of stomach and intestinal tract

• Last meal still preserved

Page 15: Reconstructing paleoenvironments using metagenomics

The Cape Blossom mammoth dung

6 February 2013Metagenomics approaches and data analysis

• Produced during the cold season

• Found among a partial skeleton

• Exact site unknown

• Age: 12,300 YBP

Page 16: Reconstructing paleoenvironments using metagenomics

DNA extraction and sequencing

• In all studies, macro-fossils (stems, leaves, seeds), micro-fossils (pollen) and ancient DNA were compared

• DNA was extracted in the ancient DNA facility using multiple extraction protocols

• Several commonly-used markers were amplified (trnL, rbcL, nrITS1)

• Sanger sequencing was done on an ABI 3730xl

6 February 2013Metagenomics approaches and data analysis

Page 17: Reconstructing paleoenvironments using metagenomics

Data analysis

• Sequences were assembled using Sequencher

• Taxa were assigned using a combination of GenBank BLAST searches and phylogenetic inference

• BLAST hits were only accepted if they covered the full query sequence and differed by at most 1 nucleotide

• Phylogenetic placement was determined on the basis of bootstrap support (1000 replicates using paup*)

6 February 2013Metagenomics approaches and data analysis

Page 18: Reconstructing paleoenvironments using metagenomics

Findings

• Ancient DNA could assign 7 ("Lyuba"), 12 ("dung") and 8 ("Yukagir") plant families, with several determinations down to genus level

• Molecules complemented and confirmed fossils

• Identified vegetation composition is generally supportive of a productive "mammoth steppe"

• Micro-fossils of specific dung fungi showed that mammoths appear, unlike elephants, to be habitually coprophagous

6 February 2013Metagenomics approaches and data analysis

Page 19: Reconstructing paleoenvironments using metagenomics

Next generation applications

• The results of the mammoth research so far have been obtained using Sanger sequencing

• Similar, as yet unpublished, research is being undertaken with the newly acquired IonTorrent "sequencing by synthesis" platform

6 February 2013Metagenomics approaches and data analysis

Marcel Eurlings at Naturalis

Page 20: Reconstructing paleoenvironments using metagenomics

IonTorrent chip generations

6 February 2013Metagenomics approaches and data analysis

Page 21: Reconstructing paleoenvironments using metagenomics

IonTorrent data pre-processing workflow

6 February 2013Metagenomics approaches and data analysis

Filter out short reads

Splice out low phred scores

Split by primer

sequence

Split by adapter

sequence

FASTA for downstream analysis

Page 22: Reconstructing paleoenvironments using metagenomics

Taxonomic identification pipeline

• Taxonomic identification of the contents of samples is a generic problem for which we have developed a re-usable pipeline

• It replicates some of the functionality of QIIME but integrates more conveniently in our HPC configuration

• Requirements:

Python 2.7 or 3.2

Biopython 1.58

NCBI-Blast-2.2.25+

Clustering programs, e.g. TGICL, Usearch, Octupus, cd-hit

6 February 2013Metagenomics approaches and data analysis

Page 23: Reconstructing paleoenvironments using metagenomics

Pipeline steps

6 February 2013Metagenomics approaches and data analysis

Optional: tag FASTA for provenance retracing across files

Cluster sequences into OCTUs of at least 10 reads

Pick exemplar sequence (random, consensus or hybrid)

BLAST exemplar sequences (local or remote)

Optional: retrace provenance

Report

Page 24: Reconstructing paleoenvironments using metagenomics

Pipeline extensions

• NBC frequently deals with samples that may contain materials from endangered species, for example:

Putative FSC wood

Traditional Chinese medicine

Incense

• We are therefore extending the taxonomic identification pipeline to check automatically whether any taxa from the sample are listed in CITES appendices

• This, however, poses additional challenges of taxonomic name reconciliation

6 February 2013Metagenomics approaches and data analysis

Page 25: Reconstructing paleoenvironments using metagenomics

Other metagenomics work

• Phylogenies from metagenomic sequence data can grow to immense sizes

• For example, the GreenGenes 16S rRNA tree has ~400k tips

• We are developing novel algorithms for pruning these trees using (Google’s) MapReduce programming model

6 February 2013Metagenomics approaches and data analysis

Page 26: Reconstructing paleoenvironments using metagenomics

Acknowledgements

• I am grateful to:

• Dr. Barbara Gravendeel for her input in developing this talk

• Youri Lammers for his great working in developing a well-documented taxonomic identification pipeline

• And to NBIC for giving me the opportunity to present this story

6 February 2013Metagenomics approaches and data analysis

Page 27: Reconstructing paleoenvironments using metagenomics

Thank you for listening