harnessing the power of metagenomics: …...rapid progress, particularly as next‐generation dna...

1

Harnessing the Power of Metagenomics: Applications, Methods, and Real World Examples

[0:00:07] Slide 1 Sean Sanders: Hello and a very warm welcome to everyone joining us online for

this Science/AAAS webinar. My name is Sean Sanders and I'm the editor for custom publishing at Science.

Today, we're going to be hearing about the very exciting and timely

topic of metagenomics. The term metagenomics has been around for a number of years now, but recently the field has shown some rapid progress, particularly as next‐generation DNA sequencing technologies have been advancing so quickly. The power of metagenomics has been clearly demonstrated in its application to many diverse areas, including characterizing the microbiomes of the human gut, seawater samples, and soil samples, impacting human health, ecology, and agriculture. All microbes in a particular specimen can now be catalogued—and useful biomarkers identified—without the need for culturing of these organisms. This type of analysis also provides researchers with a truer representation of the in situ conditions present when sampling took place.

In today's webinar, we'll introduce you to this important field of

metagenomics and hopefully provide you with some insight into how it can be integrated into your research.

I'm very pleased to have with me in the studio today three very

knowledgeable speakers making up our panel. Just to my left is Dr. Jack Gilbert from the University of Chicago in Illinois. Next to him, we have Dr. Karen Nelson from the J. Craig Venter Institute in Rockville in Maryland and finally, we have Dr. Jun Wang from BGI all the way from Shenzhen in China. Thanks to all of you for being with us. It's great to have you in the studio.

Dr. Jack Gilbert: Thank you, Sean. Dr. Karen Nelson: Thanks for having us. Sean Sanders: Before we get started, I just have some important information for

our audience. Please note that you can resize or hide any of the

2

windows in your viewing console. The widgets at the bottom of the console control what you see. Just click on these to see the speaker bios, additional information about technologies related to today's discussion, or even download a PDF of the slides.

Each of our speakers will be talking briefly about their work. After

which we will have a Q&A session during which our panel will address questions submitted by you our live online viewers. So if you're watching us live, start thinking about some questions now and submit them at any time by typing them into box on the bottom left of your viewing console and clicking the submit button. If you can't see this box, click the red Q&A icon at the bottom of the screen and this will get you the Q&A box up on your screen. As always, please keep your questions short and to the point as this will give them the best chance of being put to our panel.

Just a note, you can also log on to your Facebook, Twitter, or

LinkedIn accounts during the webinar and post updates or send tweets about the event, just click on the widgets at the bottom of the screen. For tweets, you can add the hash tag, #sciencewebinar.

Finally, thank you to BGI for their sponsorship of today's webinar. Slide 2 Now, I'd like to introduce our first speaker for this event and that is

Dr. Jack Gilbert. Dr. Gilbert received his Ph.D. from Nottingham University in the UK and received his postdoctoral training in Canada at Queens University. He returned to the UK in 2005 for a senior scientist position at Plymouth Marine Laboratory before moving to Argonne National Laboratory and the University of Chicago in 2010. Dr. Gilbert is currently a senior environmental microbiologist at Argonne National Labs and affiliated with Argonne's Department of Mathematical and Computational Sciences. He is also an adjunct professor in the Department of Ecology and Evolution at the University of Chicago, and a fellow of the Institute for Genomics & Systems Biology. His research interests are in microbial community ecology where he is currently applying next‐gen sequencing technologies to microbial metagenomics and metatranscriptomics. Dr. Gilbert is also an editor for PLoS ONE and the ISME Journal, and is co‐leading the Earth Microbiome Project. Welcome, Dr. Gilbert.

Slide 3 Dr. Jack Gilbert: Thanks, Sean. So I'd like to give everybody a bit of an overview of

what metagenomics actually is and some of the historical concepts in

3

future proceedings for pushing this forward. So here, we have a slide, which encapsulates microbial ecology. Microbes are everywhere. They're the most abundant cellular life from on the planet. There's 1 x 1030 microbes on the planet earth, that's a nonillion for those interested in the etymology of language.

In the bottom left‐hand corner of this slide, we can see a Petri dish

and this is traditionally where we've managed to culture and examine microbial biochemistry and microbial life. So traditionally, you would plate out the microbes on an agar plate, grow up the cells, and then extract the DNA and recently we've been exploring the genomic concepts by sequencing the genome using technologies.

This idea gives you a huge reference database of genomes of

cultured organisms, which can link genomic information for the biochemistry or phenotype of the microbes in which you're interested. The key thing to take into consideration when you're examining this is we can only really isolate using these methodologies at the current time about 5% of microbes in a cultured form. They do provide us with the very valuable reference database and the Genomic Encyclopedia of Bacteria, Microbial Earth and the Human Microbiome Project and MetaHIT are ongoingly generating reference datasets for genomes from these organisms, and they can definitely help us to understand the links between biochemistry and phenotype when we move forward into microbial ecology.

[0:05:22] Now, Julian Davies has a great quote here, "Once the diversity of the

microbial world is catalogued, it will make astronomy look like a pitiful science." I prefer not to think of astronomical figures when I think of very big numbers now. There are only 1 x 1024 stars in the known universe. There are 1 x 1030 cells on this planet. So the new big numbers are biological numbers and we can do away with astronomy.

Slide 4 This is a slide basically pictorially representing how we achieve

metagenomics. We take DNA extracted directly from the microbial community; in this case, a marine system, but that can be any microbial community, from the microbes in my body to the microbes on the surface of this desk to the air to the soil and every other environment. We then sequence and in the last seven years we've seen a revolution in sequencing technologies with the advent of next‐gen or next‐generation and third‐generation sequencing tech.

4

These come in the form of Roche's 454 Pyrosequencing Technology or the Illumina HighSeq platforms and MiSeq platforms we see and third generation such as the Pacific Biosciences platforms and the single molecules where like with Oxford Nanopore, we will take a single molecule and sequence that directly. This gives you information about the taxonomic, who is in the environment, and the functional or what they are doing in the environment for the microbial community composition.

Slide 5 I'd like to introduce the Earth Microbiome Project. This is an ongoing

global initiative to provide a direct and systematic characterization of microbial life on earth. We are currently in our pilot phase. We have a website here and you can follow myself on Twitter. I'm an obsessive tweeter. We see here the results from the first 5000 samples processed using a technique called 16S rRNA Amplicon pyrosequencing and Illumina sequencing. Here, we see the stream water and soil are among the most diverse communities on the planet and the human mouth and the human gut are actually surprisingly undiverse.

Slide 6 The Earth Microbiome Project is leveraging a new initiative set up by

Dawn Field from the Centre for Ecology and Hydrology in the UK and Neil Davies from the University of Berkeley in California called Genomic Observatories. This is an idea to explore the ability to do metagenomic and genomic sequencing on microbial communities and eukaryotic communities from different ecosystems around the world on a long‐term basis.

Both the Earth Microbiome Project and Genome Observatories are

employing multi‐omic technologies and we've put omic at the end of everything nowadays. But that's genomics to look at single organisms, metagenomics to look at the functional capability of communities, metatranscriptomics to look at the expressed function of a community, metaproteomics to look at the proteins generated after expression, and metametabolomics to look at the metabolite products coming from the system. I'll explain a bit later how we can take all of this information and predict between those different multi‐omic levels.

Slide 7 This is the diagram pictory representing Northwest Europe and here

we see France and England and we have a site in the English Channel

5

we call L4, which represents one of the most characterized single locations on the planet for microbial community diversity.

Slide 8 In 2007, we did a 12‐month study and this is a very good example of

how you should preconceive the types of ecological questions you have before you start a study. We looked at microbial community diversity using pyrosequencing technology of the 16S gene to understand the taxonomic fluctuations in the microbial abundances over 12 months in 2007. And we found that essentially the microbial community was driven by temperature and nutrients.

Slide 9 When we expanded that to look at six years, so we took this from

2003 to 2008, we used the same technology just expanding the temporal range of our study, we found that all of a sudden we saw different traits accompanying in the system. So this is a plot of richness you can see in the diagram so the number of species in the system. We found that the richness peaked in winter and it was lowest in summer and this was very robust and cyclical. And we also found that nutrients and temperature were no longer so important. In fact, the microbial community seemed to be responding wonderfully to day length or the amount of sunlight that occurred and how that differed between winter and summer.

[0:10:11] Slide 10 We can understand this by putting it in a phylogenetic context and

here we have a phylogenetic tree against which we've mapped the abundancies and the persistence of microbial communities in each system. And this enables us to track and understand the taxonomic fluctuation in community abundance through time and space. The takeaway message from this figure is the more abundant an organism, the more persistent it is through time.

Slide 11 To measure if that concept was actually playing out as a change in

the membership of a community or a change in the relative abundances of members in the community, we took a single time point from that 72‐time point series and we sequenced 10 million reads, 10 million 16S fragments from that one time point.

Slide 12

6

What we found was that nearly 100% of every single species we'd identified in that 72‐time point series was found in that one time point. However, more interestingly, those million fragments from the 72‐time point series only comprised less than 5% of the total diversity in that one very deep sequence time point. And this expounds the capability of sequencing technology where you can now explore the full depth of the total taxonomic community composition in a system and really start to understand how many species there are and how they fluctuate through space and time.

Slide 13 This led us to understand that we could potentially model this

system and here we see microbial assemblage prediction, a new bioclimatic model we'd implemented for the English Channel. Here, we see on the top the observed change in abundance of certain species and in the middle plot, we see our ability to predict that community composition, and on the bottom, we see the correlations between observed and predicted. Obviously, in the middle plot we've expanded that, we've extrapolated that to 2000 to 2009 so we now have a decade long predicted time series based on six months of observation.

Slide 14 We also want to look at the community composition in terms of its

functional capability and here we see a new technique called predicted relative metabolite turnover, it's a mouthful, called PRMT. We call this the hairball. Essentially, the nodes are metabolites, ammonia, carbon dioxide, oxygen, water, nitrate. The edges are the relative abundances of annotated enzymes and their potential activity to transfer those metabolites between other metabolites. So, this is a metabolite ball, a metabolite hairball. Essentially, this enables us to predict the community structure based on the metabolite levels.

Slide 15 So here, we see the changing relative consumption or production of

metabolites in a network over 24 hours in the English Channel. Here, we see the ability to ‐‐ if you focus on carbon dioxide, the big circle in the middle, we see that during the daytime the system starts to consume carbon dioxide when it goes dark and we see that happening there. And when the CO2 becomes light, we see CO2 being generated by heterotrophic activity during the nighttime. We can actually correlate this directly to observed fluxes in carbon dioxide coming from the water system and we see an incredibly

7

good correlation about 90%. This suggests that using metatranscriptomic data and metagenomic data we can actually start to predict the ability of the community to influence its environment.

Slide 16 Here, we see taking this to the next level. We take satellite data from

the English Channel and the western continental shelf. We use that in concert with microbial assemblage prediction of the taxonomic structure from 16S rRNA to predict over space and time (in this picture, we see 5000 locations turning over for 12 time points in 2008) the microbial community composition at any given location. Then we take that predictive relative metabolic turnover, the PRMT algorithm, and we predict for one of over 1000 metabolites possible the increase in carbon dioxide consumption or the increase in carbon dioxide generation for those 5000 locations over 12 time points.

This enables you to look at hot spots to where systems become

carbon sources or carbon sinks. We can now modulate this model by changing the temperature manually or changing the pH of the system in response to for example climate change and understand how the system will respond to differing climate change strategies with regards to the ability of the ecosystem to provide ecosystem services from the microbial community.

[0:15:02] Slide 17 I'd like to acknowledge everybody in my group at the University of

Chicago and Argonne National Laboratory, Plymouth Marine Laboratory where a lot of this work took place, and a number of other collaborators from around the US. Thank you very much.

Sean Sanders: Great. Thank you so much, Dr. Gilbert. A very interesting and

engaging introduction for us. Slide 18 We're going to move right on to our second speaker and that today

is Dr. Karen Nelson. Dr. Nelson received her undergraduate education from the University of the West Indies, and her Ph.D. in microbiology from Cornell University. She worked previously at the Institute for Genomic Research and is currently the director of the Rockville, Maryland campus of the J. Craig Venter Institute. Dr. Nelson has extensive experience in microbial ecology, genomics, and metagenomics as well as in microbial physiology. She has also been involved in the analysis of the microbiota of the human stomach and

8

gastrointestinal tract and was part of a national team of researchers who completed the first comprehensive metagenomic survey of the human gastrointestinal system. Dr. Nelson is editor‐in‐chief of the journals Microbial Ecology and Advances in Microbial Ecology and is a key investigator in the multicenter NIH Human Microbiome Project, which focuses on understanding the microbes that live in and on the human body and their contribution to human health and disease. A warm welcome to you, Dr. Nelson.

Slide 19 Dr. Karen Nelson: Thanks, Sean. Thanks for having us. There you go. So welcome

everybody. I know you're joining in from different locations all over the world so thanks for being up late in the night. I just wanted to reach back to Jack's presentation. I thought he gave a great overview of metagenomics in general and where the technologies are going and the value of doing metagenomic sequencing in different environments in terms of the information you can gain from that environment that you're studying.

My presentation today is going to focus primarily on what we're

learning as we start to look at the human microbiome. So just remember that in terms of metagenomics, you can study soils, air, water, animal systems, but a lot of focus in the past four to five years has been turned towards studying the human body.

Slide 20 Most of us do not appreciate how many microbes live on and in us.

The estimates are currently that the number of microbial cells on the human body and in the human body outnumber or own cells by a factor of 10. So that said, these microbes play a significant role and we don’t quite understand what the role is. We know that they are associated with various diseases, diarrheas, tooth decay, bacterial vaginosis, but the full complement of these microbes, their role, how we can deal with them, how we can take advantage of them and work better in this relationship that we have is yet to be completely elucidated.

So if you look at the overview I have here and you're going to see

this NIH woman a couple of times today because she's a very friendly, although not live source of information, but basically, as I said, the collective number of microbiome cells exceeds the number of human cells by an order of magnitude. We hardly know about these organisms that live on and in us. We know they play a significant role especially in development of linings for the

9

gastrointestinal tract lining, in immunity and resistance to pathogenic infections, and we know that they can actually be used as biomarkers or sentinels, if you want to call it, for onsets of certain diseases.

Slide 21 So there are a number of huge studies underway currently

throughout the world and I'm going to mention the NIH Human Microbiome Initiative, which is one of the largest studies focused on healthy individuals. I think Jun might mention the MetaHIT work that’s underway and I won't discuss that in detail. But just to give you an understanding of what's going on in terms of funding from the National Institutes of Health, there's a large study focused on 300 healthy individuals where we're collecting metagenomic samples from 15 to 17 body sites sequencing these samples and putting all this information in a public domain. The samples that are being sequenced are listed towards the left of your slide.

Now, in parallel to that, NIH has funded a number of studies that are

focused on different diseases. They have funded the Data Analysis and Coordination Center called the DACC, which is led by Owen White at the University of Maryland. And in parallel to that, they are sequencing about 3000 reference genomes and Jack mentioned this in his presentation.

The value of having the genomes from organisms that can be

cultured is multifold. We can do studies in the lab on these species but they also act as scaffolds for the metagenomics data that's coming out of these metagenomic studies that we're doing. For additional information on the National Institutes of Health studies, I would encourage you to go to their website and look at the data and the datasets and more information on clinical processes, etc.

[0:20:19] Slide 22 Now, I'm going to turn focus from there to talk about studies that

are underway specifically at the J. Craig Venter Institute and we've been working in this space since 2004‐2005. We did some very early studies with David Relman and his group at Stanford and since then we have gone on to conduct a number of additional studies. By virtue of being primarily on an omic center or a sequencing center, we work with other collaborative sites. So we work with clinical centers and hospitals to get all our samples. So every example I have listed here is actually done in collaboration with a university or a hospital for example. Well, you can you see the magnitude and the

10

different types of studies that are going underway and this is just at one institution and know worldwide other kinds of studies are being conducted currently.

I don’t have enough time to go through any one of these studies in

detail on its own. But for example we've been looking at the progression of esophageal cancer in collaboration with Pei at NYU and from these initial studies that we've been doing, you can see clear patterns in terms of change of microbiota in the cohort with time suggesting that you can come up with new microbial predictors for onset and progression of this disease.

We've been looking very closely also with bacterial vaginosis and

pre‐term babies. This is a billion‐dollar industry right now and bacterial vaginosis in general and what we've been learning of late is that a lot of women actually are asymptomatic and carry infections and they don’t know they're infected and this can lead to disorders over time in these women if it's not treated early.

Other studies underway relate to for example type 1 diabetes where

we're working with NIDDK to look and see if we can develop non‐invasive approaches to diagnose children who are in early stages of developing type 1 diabetes. And for anybody who has been familiar with this disorder, you know, it's a three‐year‐old developing type 1 diabetes and not knowing what's happening, it's a very unsettling and unhappy experience for the parents and the child at the same time.

Slide 23 Just going on to some additional studies we have underway. We

have been looking mouse models of alcoholism with David Brenner and his team at UCSD, looking at urinary tract infections in children and adults using metagenomic approaches, and looking at different animal models. For example, you can use a ferret or a mouse to look at infectious agents and how it impacts the normal microbiota of the animals and try and extrapolate to what happens in humans. And more recently, we have started to go overseas in projects funded by NIAID where we're looking at the impact of the host microbiota for individuals who are infected with malaria.

Slide 24 So Jack gave a great presentation about different kinds of omics

technologies and I just wanted to mention that metagenomics is wonderful and fabulous, but it's like a starting point. You really need

11

to delve into the environment and look at what's being expressed. You know, we're finding out that very minor players in a population can have a significant impact on the total operation of that system that we're looking at. And we're learning also by looking at humans that the host genetics also impacts the relationship with the microbiota so in different populations, microbial populations, in different human populations the microbiota is going to be different and the microbes might behave differently and we're just starting to understand what those differences might look like.

I think ultimately what's going to be fabulous is that we're going to

be looking back at traditional approaches to human health. We're going to be identifying new probiotics, and the microbes are going to become real sentinels for us to understand our own health and how we can improve our health.

Slide 25 Moving beyond the microbiome. As I mentioned before, the

sequencing approaches, wonderful as they, are just the beginning. We need to delve in and see what's being expressed, what the proteins look like, the metabolome, the proteome, and essentially a systems biology approach is needed more to integrate the information that we're finding that's coming out of these metagenomics studies.

Slide 26 And in closing, I would like to point out as I mentioned before that as

a sequencing center, we have to work in collaboration with a number of groups. I'd like to acknowledge all the collaborators we have. I just have a subset listed here. Also as a non‐for‐profit organization, all our funding comes from the government and we get a lot of money from the National Institutes of Health for our Human Microbiome research. And I'd also like to acknowledge all of my lab members that work very closely to push these studies forward.

[0:25:21] Slide 27 to Slide 28 Sean Sanders: Great. Thank you so much, Dr. Nelson. We're going to move right on

to our final speaker for this webinar and that's Dr. Jun Wang. Dr. Wang is the executive director of the BGI (previously known as the Beijing Genomics Institute) and was instructional in the 1999 founding and growth of the BGI Bioinformatics Department, which is now widely recognized as one of world’s premier research facilities. He also holds a position as an Ole Rømer professor at the University

12

of Copenhagen in Denmark. Dr. Wang has been recognized with a number of awards from His Royal Highness the Prince Consort’s Foundation in Denmark, an Outstanding Science and Technology Achievement award from the Chinese Academy of Sciences, Top 10 Scientific Achievements in China award, and the prize for Important Innovation and Contribution from the Chinese Academy of Sciences. His research focuses on genomics and related bioinformatics analysis of complex diseases and agricultural crops, with the goal of developing applications using this genomic information. Warm welcome, Dr. Wang.

Dr. Jun Wang: Thank you very much, Sean. So you've already heard wonderful talks

with Karen and Jack so I might actually only focus on what BGI is doing in the metagenomics field. As you can see from the title here, I'm sort of talking about E. coli gut microbiomes and environmental microbiomes project at BGI.

Slide 29 to slide 30 In China, there is actually an old saying talking about what you are

what you're eating. So the extreme case is actually the E. coli. We've already heard this sort of German outbreak E. coli tragedy several months ago. So using the sort of combined technology from the first generation sequencing to the third generation sequencing allows you to really sequence one cultured bacteria in a few days.

Slide 31 You can really lots of annotations of those bacteria, you find out all

the toxic proteins, you find out all those kind of resistant genes and so on. But the problem is that we are very lucky on this case because the E. coli itself could be cultured, could be studied.

Slide 32 But how about really the gut microbiomes, which probably 90% or

even over 90% of the bacteria cannot really be cultured. Slide 33 So it's very important first of all to actually establish a reference

genome of the gut microbiota. Now this is what we did together with the MetaHIT Consortiums in Europe as the European FP7 project.

Slide 34 So we first established a gene catalogue of those European

populations. So the samples are from the Danish population and also Spanish populations. We picked up 124 individuals and sequenced

13

the fecal samples up to 500 giga base pair. At that time, it's like 100 times more than any data generated at previous studies.

Slide 35 Then we sort of developed quite interesting assembly pipelines. So

we assembled the individuals separately then we had the merged assembly afterwards. So in total, we got 14 million ORFs but then of course 3.3 million non‐redundant genes have been identified. It's kind of 100 times more than your own genes, which is really a large number there.

Slide 36 After all those analyses across the individuals, we figured out 40% of

the individual's genes are shared with at least 50% are individuals of the cohort. But just keep in mind, all those samples are from sort of European populations. I’m not talking about Chinese yet so probably we'll have more diversity there.

Slide 37 Also, if you look at all these bacteria species across samples, they

actually have similar compositions over different individuals, but different abundance there. So it's also very important to really understand these differences and also the correlation with disease too.

Slide 38 If you look at all those functions for this reference gene set, we

identified at least 5000 new functions. There are about, you know, 20 proteins per family there. This also are unknown and previously really poorly characterized function categories. People are sort of doing a lot of follow‐ups on those function studies there.

Slide 39 If you really tried to set up like a minimum metagenome set across

the entire samples, to see how many functions are really sort of enriched in all the samples, we found out this category of 6300 functional categories, which are again very important for the gut ecosystems.

Slide 40 If you do the pathway analysis, it shows up like this. Of course, most

of the interesting pathways has to be studied again in follow‐ups. Slide 41

14

One of those studies for the MetaHIT project and also several other BGI's project is really focusing this sort of association between the gut flora with the diseases. We can do it in different levels. This is the species level seen. So you see all those UC patients and the Crohn's disease patients and you have the wonderful PC analysis with the healthy patients they can clearly separate out those different individuals. This is the study based on 177 species profiling results.

[0:30:29] Slide 42 You can do the same scene, but in a gene level. This is another study

we did in the Chinese type 2 diabetes study. We actually identified 80,000 genes, which have statistically significance sort of association with type 2 of the Chinese guts. It's quite also important too.

Slide 43 A similar analysis had been carried on for the Danish study. This is

the BMI data. Again, it's in the gene level and we see all those BMI genes are mostly from firmicutes, one type of bacteria, suggesting that obesity is really correlated to loss of firmicutes.

Slide 44 You can even really get into a resolution not just in the gene level

but in the base pair level. This is actually a preliminary results study of the Crohn's disease that we did. We identified one variant, actually really one bacteria, which are really less variant in the lung of Crohn's disease individuals. But in the Crohn's disease individuals we saw lots and lots of variations there. So you can really study even in the variation levels and also SNPs, you know, third generation and things like that.

Slide 45 So it's very, very important again to scan all different kinds of

diseases in different levels, the gut flora with the disease. We all know diet has a dominant role in the shaping of the gut microbials and also your host genome play a very, very important role to decide which gut microbiome will actually stay in your gut and eventually those gut microbials play a very important role in the metabolisms of your body.

People are using mouse as a model. I don't have the slides here, but

BGI actually identified another animal model that is that pig, which share much more similar gut flora with human than a mouse. So I

15

mean we need really new animal models for those gut microbial studies too.

Slide 46 But actually a lot of those gut bacteria we got from the environment.

It's very important to understand the environment is really related to your health.

Slide 47 We already know the environmental microbial genes can transfer

into the gut bacteria. One of the studies has been carried out by the Japanese group. They're talking about those marine bacteria to the Japanese gut microbiota. We actually found those genes also in the European samples. Probably, they are eating sushi or something like that.

Slide 48 There are also things like environmental microbial projects in BGI so I

listed some of them about the natural environment and also artificial environment.

Slide 49 I'll just give you one example about a Chinese wine company,

actually a hard liquor company. These are very, very expensive wines. They want to replicate it into different locations, but they certainly failed to do it. So by sequencing from different locations and different sort of stages of the distillery phase, they are actually doing very good quality controls just by metagenomic sequencing to see those flora or those bacterial changes there. So it could be used for lots of industry applications if you really do it in the right way.

Slide 50 Just to put it a summary. So you have the single microbials, you have

the symbiotic communities, you have extreme environments, you have animal guts, you have soils, I mean all those type of microbial complexity needs to be studied and to be understood in order actually to the particular case to understand the human health.

Slide 51 These are some people to thank. We have about 4000 employees at

BGI. We have the BGI board, the analysis, and gene sequencing platforms. I cannot acknowledge everybody at BGI. All those E. coli work is working together with the Hamburg hospital there. We actually have a collaboration in the MetaHIT for example and

16

LuCAMP for the type 2 diabetes study too. So there's also lots of acknowledgement for BGI's collaborators.

Slide 52 Okay. Thank you very much. Sean Sanders: Thanks so much, Dr. Wang. Many thanks to all of our speakers for

their excellent presentations and we're now going to move on to the questions submitted by our online viewers.

A quick reminder to those watching us live that you can still submit

your questions by typing them into the text box and clicking the submit button. If you don't see the box on your screen, just click the red Q&A icon.

So the first question that I have to put out to the panel is what would

be a reasonable sample size or how does one calculate a reasonable sample size for a clinical study say wishing to detect differences in the virome of a disease versus a healthy consort? I know we talked a little bit about before, Dr. Gilbert, maybe you'd like to start us off?

[0:35:14] Dr. Jack Gilbert: Yeah. I mean every single experimental project should be designed

to understand the ecology, but sometimes it's very difficult to know that. So I mean for an environmental sample, it's definitely a question of investigation, but for a human sample, I'll pass this one to Professor Wang and he can give us a good understanding of it.

Dr. Jun Wang: Well of course, this heavily depends on the disease and also the

impact of the different bacteria there. I mean it's like a GWAS study right? Some of those genes, disease genes, you can find it out just by, you know, hundreds of samples, but sometimes you even sequence thousands of samples and you don’t really find any meaningful results there. So it's really up to again the impact of the different bacteria and also the disease nature itself. But if you really want to put a smart guess there, I would think hundreds of samples will be a very good survey for the first step. Karen, probably ‐‐ yeah.

Dr. Karen Nelson: Yeah, I would agree with you on that minimum to start off, but I also

agree that we're still learning as we go along like what the current sample size is for a lot of these studies.

Dr. Jun Wang: It's also very important about population specific thing there.

17

Dr. Karen Nelson: Exactly. Dr. Jun Wang: Uh‐hum. Dr. Karen Nelson: Yeah. Sean Sanders: And a follow‐up question to that is about ethnic diversity and how

that might influence these metagenomic studies, whether you see differences in different ethnic groups?

Dr. Jun Wang: Well, the key here is really diets play a dominant role in how to

shape up your gut flora. Chinese certainly have different gut bacteria we see than Europeans and also other countries. We recently sequenced lots Chinese metagenomics. It will be very important to do the comparison. I believe the HMP [Human Microbiome Project] also did the same study.

Dr. Karen Nelson: Yeah. Well on a separate study we've been ‐‐ and bacterial vaginosis

is an example where we have looked at different populations of women. And the microflora that exists is actually different in the different backgrounds. And I think again this is an area that we're just beginning to understand, but we are definitely starting to reach out. I mean there's an initiative to start doing more microbiome work in parts of Africa for example to address these exact questions or in the Caribbean where you have merged populations from Africa and the west and see what's going on with these cohorts. But you can only assume that the host's genetics is going to impact the microbiome.

Dr. Jack Gilbert: You know, as Jun said though, there are several studies recently out

that have had a major impact of diet. Dr. Karen Nelson: Yes. Dr. Jack Gilbert: So diet does impact, you are what you eat. Sean Sanders: Uh‐hum. Dr. Jack Gilbert: Yeah. Dr. Karen Nelson: But you can imagine on the skin though for example that ‐‐ Dr. Jack Gilbert: Yeah.

18

Dr. Karen Nelson: ‐‐ you're going to have an impact, the different populations so. Sean Sanders: So a question that might be related to this is this viewer says it's easy

to see how diversity is measured using metagenomics, but how is relative abundance measured? Dr. Gilbert, you want to start us off?

Dr. Jack Gilbert: Okay. So number one, this is not a perfect world and we don’t have a

perfect suite of tools. We're relying entirely upon our experience and as that grows, I mean we use scientific techniques to understand these things. The current techniques we have for example with 16S rRNA Amplicon metagenomics, which must be separated from standard shotgun metagenomics which is everything, Amplicon metagenomics we have to take into consideration some of these genes are found more than once in a single genome and hence the relative abundance could be biased by that concept.

Relative abundance is a very useful statistical method so we try and

approach methodology by trying to understand how the communities do vary. But it has to be taken with a pinch salt, if you will, pardon the vernacular. To try and understand how a system changes you must make certain assumptions about how that system works. I don’t know in the human system. You know, with the reference genomes, this is becoming more prevalent so you can actually track a particular organism by mapping the genes generated from the metagenome on to this reference organisms and then track that as a relative abundance in the system. However, you know, there are obviously biases associated with that mapping, you know, the nucleotide or protein identity used to map a particular fragment to a particular gene in the genome can bias your result. But it's an ongoing investigation. We're working through that. If anyone else wants to comment on that?

Dr. Karen Nelson: Well, I think added to that, you know, every step you take in

processing a sample you can influence the relative abundance. Dr. Jack Gilbert: Exactly. Uh‐hum. Dr. Karen Nelson: And, you know, people don’t talk about that in much detail, but I

know when you're processing human samples you're tossing out stuff at each step. So there are a number of variables that influence what the outcomes will look like.

Sean Sanders: Uh‐hum.

19

Dr. Jun Wang: Yeah. And also the sequencing technology also‐‐ [0:40:00] Dr. Karen Nelson: Absolutely. Absolutely. Dr. Jun Wang: ‐‐ have an impact there too so everything is biased. Dr. Jack Gilbert: Everything is biased. [Laughter] Sean Sanders: Actually, I have a couple of questions to follow up on that and one

was what are some of the key challenges of analyzing metagenomic data? Maybe you could talk a little bit more about that. So we talked about collecting the samples, but what about the data side, the actual analysis? So, Dr. Wang, could you start us off?

Dr. Jun Wang: Well, it is very, very difficult. I mean you compare doing a thousand

genomes of the human genome to a thousand metagenomes of different individuals is certainly much more difficult as the diversity is different, which put the examples, you try to do a sampling of an environment samples would really take like months ‐‐

Dr. Jack Gilbert: Yeah. Dr. Jun Wang: ‐‐ in a supercomputer in order to really understand and be able to

sample it. I'm not talking about by annotation of it. If you look at all those sort of gene complexity and try to do the comparative genomics across the different organisms, you really, really need supercomputing.

Sean Sanders: Uh‐hum. Dr. Jun Wang: So not just that but all the algorithms, all the data analysis tools,

statistics of different associations between the gut metagenomes and disease they're all underdeveloped, so they are all premature. So I think we really have to encourage lots of statisticians and mathematicians, bioinformaticians really jump in to this field.

Dr. Jack Gilbert: I'd like to add to that just quickly. There's an old adage that there are

as many statistical protocols as there are statisticians on earth. Dr. Karen Nelson: Uh‐hum. Sean Sanders: [Laughs]

20

Dr. Jack Gilbert: And we get the same thing, there are as many bioinformatic protocols and algorithms as there are bioinformaticians on earth. And we need to start working in a far more concerted effort to try and understand how we can bridge the gap between different cohorts of organizations. I like to apply the physics model. You know, we should all be very inspired by an organizational group across different countries that can get several hundred billion dollars together to build a giant ring under Switzerland. And, you know, if we can work together across international boundaries and disciplines then we could potentially gather that funding as well to actually start to explore the power and importance of microbial communities.

Dr. Jun Wang: Yeah, absolutely. Sean Sanders: I think Karen is looking a little bit skeptical. Dr. Karen Nelson: No. [Laughter] I actually agree with him. I heard somebody discuss this in a different

setting, but it's like it's still a little bit mom‐and‐popish like. Dr. Jack Gilbert: Yeah, it is. Dr. Karen Nelson: And we don’t necessarily talk to each other as much as we should. Dr. Jack Gilbert: Right. Dr. Karen Nelson: And then you encounter these issues that your study's done and

mine's done and we haven't used the same approaches or analyses tools so how can we understand exactly what you're trying to see.

Dr. Jun Wang: Exactly. Sean Sanders: Uh‐hum. Dr. Karen Nelson: I think in terms of challenges, data storage, assembly, all these tools

that are post sequencing because sequencing is so democratized now and it's cheap.

Sean Sanders: Uh‐hum.

21

Dr. Karen Nelson: And you get a lot of data, but it's like what do you do with that data and how do you interpret that data reliably is what I consider to be one of the biggest challenges.

Sean Sanders: So coming back to databases, this question and actually this is

probably the most popular question that's come in today from three or four different people is what about sequences that are not known in the database or unknown organisms?

Dr. Karen Nelson: Jack? [Laughs] Sean Sanders: [Laughs] So I guess, Dr. Gilbert, you'll be ‐‐ Dr. Jack Gilbert: Yeah. Sean Sanders: ‐‐ in the firing line. Dr. Karen Nelson: Fantastic. Dr. Jack Gilbert: Unknown, we call them ORFan sequences. Sean Sanders: Okay. Dr. Jack Gilbert: It comes from the concept of the open reading frame or the ORF that

dictates a protein coding sequence in a genome and, you know, the ideology that they have no parent, i.e., they have no known representative in the databases. When we sequence say a trillion base pairs from a metagenome of any given ecosystem, we can generally annotate maybe about 50 sometimes 60, it varies across the environments. It can be as low as 30%, it could be as high as 80%. You know, in the human system because it's so well studied and there are so many of these genomes coming out and there's such an impetus, a drive. We are human, we want to find out about ourselves. You can actually increase that significantly and it's that kind of focus that will help to push this forward. The traditional biochemical characterization of proteins is absolutely essential to overcome that gap in knowledge. You know, fundamentally we need to start focusing on that in the environmental systems a little bit more. I mean we're about trying to understand how the environments differ, but we also need to understand how many different unknown proteins are out there, how many different unknown enzymatic reactions and pathways exist, and how we can uncover and understand those. I think that's fundamentally one of the big five‐year goals.

22

Dr. Karen Nelson: I think even on the human side, it's big, right? Because you guys had

almost 40%‐‐ Dr. Jun Wang: Absolutely. Dr. Karen Nelson: ‐‐ of your data that you could not characterize at all so. Dr. Jun Wang: But there is actually an approach not just ‐‐ I mean there's lots of

unknown proteins and unknown genes there. But I would encourage actually we do it from this application‐driven thing. That means, you know, for example if we find out several genes, which are associated with a certain disease and those genes should be put on a sort of higher priorities ‐‐

[0:45:02] Dr. Karen Nelson: Uh‐hum. Dr. Jun Wang: ‐‐ to understand function compared to the others. Dr. Karen Nelson: Yeah. Dr. Jun Wang: So it will be very good to first start to have really a big survey type of

thing to have more statistical associations with certain traits, phenotypes, anything like that and start from there doing animal models and whatever to study the function… probably is better than just to randomly pick up certain unknown genes and doing their following study.

Sean Sanders: Right. And I'll come back to something you mentioned about the

sequencing technologies that are available. There's been a number of questions asking about which is the preferred sequencing technology without ‐‐

Dr. Karen Nelson: [Laughs] Sean Sanders: ‐‐ you know, pushing any one particular technology. Maybe you can

talk about your experience with the Roche 454, Illumina and maybe some of the new systems that are out there.

Dr. Jun Wang: I put it into the ideal situation. So we wanted to sequence one

bacteria at just one read and then actually getting a very cheap and higher throughput there.

23

Sean Sanders: Uh‐hum. Dr. Jun Wang: But none of those technology really meets the standard. Dr. Karen Nelson: They're not there yet. Sean Sanders: Uh‐hum. Dr. Jun Wang: So the Illumina HighSeq is perfect for the very high throughput, cost

efficient, but the problem is that they are not long enough, right? Sean Sanders: Uh‐hum. Dr. Jun Wang: Several other sequencing technologies could really provide very long

reads, but the problem is it's very costly especially for metagenomic sequencing.

Dr. Jack Gilbert: Uh‐hum. Dr. Jun Wang: So I think we just have to live with the fact that nothing is really

perfect, that we just have to work on it. Sean Sanders: Uh‐hum. Dr. Karen Nelson: But we will get there though. Dr. Jun Wang: Yeah. Dr. Karen Nelson: I mean seven years ago we were, you know, dealing with Sanger and

making it work. Dr. Jun Wang: Yeah. Sean Sanders: Uh‐hum. Dr. Karen Nelson: So I can only imagine that it will get better. Dr. Jack Gilbert: Yeah. So one of the key strategies is to combine these efforts. Dr. Karen Nelson: Yeah. Dr. Jack Gilbert: Combine the technologies. You know you can ‐‐ 125, 150 base pair

fragment on an Illumina and you get what 10 billion reads, 100

24

billion reads something like that or 10 billion reads? And then, you know, gather it to a 454 machine you get an 800 base pair fragment.

Sean Sanders: Uh‐hum. Dr. Jack Gilbert: But you get 1.5 million reads. It's orders of magnitude different, but

the approaches can then be combined to start understanding how to generate better assemblies and improve that speed of assembly using different approaches. The key thing is the assemblies as they exist today and the ability to use the technology to approach that assembly method has been designed entirely to deal with genomic clonal information. And fundamentally, metagenomics is ‐‐ you may have 100,000 or a million bacterial species in a metagenomic sample, you know, even in the human gut, it can be in the orders of, you know, six orders of magnitude of microbes.

Dr. Karen Nelson: Yeah. Dr. Jack Gilbert: So it's not a clonal environment. There isn't just one genome. Sean Sanders: Uh‐hum. Dr. Jack Gilbert: There's thousands of genomes and there's thousands of variants of

one species, you know, strainal variation. Dr. Jun Wang: So it's actually very, very important to mention that currently I see

several groups including BGI already develop this single cell sequencing technology.

Dr. Karen Nelson: Uh‐hum. Dr. Jack Gilbert: Yeah. Dr. Jun Wang: So if you could isolate a single cell from the microbial community

and sequence that single cell, that would be really very, very important to understand the complexity of the ‐‐

Dr. Karen Nelson: The system. Dr. Jun Wang: Yeah, community. Dr. Jack Gilbert: There's the importance then to overlay that with single cell

transcriptomics.

25

Dr. Jun Wang: Exactly. Exactly. Dr. Jack Gilbert: [Laughs] Because every single cell could have a different

transcription response. Dr. Jun Wang: And proteomics, and proteomics. Dr. Karen Nelson: Right. Dr. Jun Wang: Exactly. Sean Sanders: Interesting. Now what about phages and viruses? Is anyone studying

those or are we just looking at bacteria right now? Dr. Karen Nelson: No. So at the Human Microbiome we're doing hundreds of reference

viral genomes from the human body and NIAID has a huge program looking at viruses that are sequenced from the human body. And then within the metagenomic data, you can mine and pull out phage and viral sequences very efficiently. So that's underway with a number of groups and you guys are probably doing that.

Dr. Jun Wang: Yeah. We're doing the same thing. Not just mining the data from the

gut metagenomics data we have, but also trying to kind of enrich those ‐‐

Dr. Karen Nelson: For those particles, yeah. Dr. Jun Wang: Yeah, the particles about the virome, we call human virome projects. Dr. Karen Nelson: Yeah. Dr. Jack Gilbert: And the Earth Microbiome Project is doing the same thing, but for

seawater and soil and air trying to enrich. Dr. Jun Wang: Yeah. Dr. Karen Nelson: Right. Dr. Jack Gilbert: Because it's, you know, the virus particles even though they are very

abundant or an order of magnitude more abundant than the bacteria their genomes are much smaller. And so there is actually a significant, a very tiny proportion if you just sequence them directly. So those enrichment strategies are vital for us especially.

26

Dr. Jun Wang: There would be more unknown virus there. Dr. Jack Gilbert: Oh, yes. Sean Sanders: Uh‐hum. Dr. Karen Nelson: But they have definitely lagged behind in terms of getting attention. Sean Sanders: Okay. Dr. Karen Nelson: I think all the environmentalists would agree. Sean Sanders: [Laughs] Dr. Jack Gilbert: Also the eukaryotics ‐‐ Dr. Karen Nelson: Oh, even further. Dr. Jack Gilbert: Single celled eukaryotic organisms are almost entirely ignored. Dr. Karen Nelson: Yeah. Sean Sanders: Really? Dr. Jack Gilbert: Purely because we haven't had the technology to get to that. So, you

know, that's another key point. You know, we've got the viruses at the bottom and the bacteria and archae are in the middle and the single‐celled eukaryotes at the top.

Dr. Karen Nelson: Yeah. Dr. Jack Gilbert: And they're incredibly diverse. Sean Sanders: Uh‐hum. Dr. Karen Nelson: Yeah. Dr. Jack Gilbert: You know, millions of species, we just ‐‐ I don't know them yet. Sean Sanders: Right. Dr. Karen Nelson: Yeah. Sean Sanders: Well great open question for somebody to take on.

27

Dr. Jack Gilbert: [Laughs] Yeah. Dr. Karen Nelson: Absolutely. [0:50:01] Dr. Jack Gilbert: A young researcher out there with a lot of time on their hands. Sean Sanders: [Laughs] Dr. Karen Nelson: And energy. Sean Sanders: Yeah. Dr. Jack Gilbert: Yeah. [Laughs] Sean Sanders: So talking about other organisms, what about plants and plant

pathogens? Dr. Nelson, I know ‐‐ Dr. Karen Nelson: Yes. Sean Sanders: ‐‐ you've mentioned that. Dr. Karen Nelson: Yeah. I am aware of a few studies underway right now looking at the

interaction between the microbes for example in soil and plant roots. I know that they're looking at microbes on the surfaces of leaves. I heard of a really interesting study recently where they are looking at microbes that get into spinach and are associated like for example the E. coli outbreak. So there's a lot of attention being devoted to that now. But again, that's another area that hasn't gotten as much attention as human or other environments I think.

Sean Sanders: Uh‐hum. Dr. Karen Nelson: But definitely they're starting to make inroads there. You know,

that's going to be significant because the microbes are probably impacting productivity of a lot of plant species way beyond what we appreciate right now so.

Dr. Jun Wang: Well, yeah. We recently finished up or published a paper in Genome

Research for wheat pathogens for example. Dr. Karen Nelson: Uh‐hum.

28

Dr. Jun Wang: So actually another system should be really draw their attention is to nitrogen fixation‐‐

Dr. Karen Nelson: Nitrogen fix ‐‐ yeah. Dr. Jun Wang: ‐‐ systems ‐‐ Dr. Karen Nelson: Yes. Dr. Jun Wang: ‐‐ will be very, very interesting to look into. Dr. Karen Nelson: But my journal for example gets a lot of papers on rice ‐‐ Dr. Jun Wang: Yeah. Dr. Karen Nelson: ‐‐ and grapes, and, you know, these agricultural crops. Sean Sanders: Right. Dr. Karen Nelson: And how can we improve productivity by looking at the microbes

associated with these plants. Dr. Jack Gilbert: Especially the grapes, that's very important. [Laughter] Sean Sanders: Yes, very. I understand. Dr. Karen Nelson: You can have rice wine. Dr. Jack Gilbert: Well that’s true, yeah, yeah, yeah. Dr. Jun Wang: Wheat and barley, you know. Dr. Jack Gilbert: I know. Dr. Jun Wang: Yeah. Dr. Jack Gilbert: More coming on. Sean Sanders: So talking about extracting DNA, there was a question that came in

that asked whether it's possible that DNA is coming from dead or inactive cells or it's just sort of sticking to solid particles in the

29

environment and whether there's any way that you get rid of that or whether this is something you want to look at as well.

Dr. Jack Gilbert: In marine systems, it's actually less of an issue than we thought it

was originally. Sean Sanders: Uh‐hum. Dr. Jack Gilbert: The concept ‐‐ you know, especially in oligotrophic environments

that's environments without many nutrients available. DNA is quickly snapped up and consumed.

Sean Sanders: Hmm. Dr. Jack Gilbert: You know, it's an incredibly phosphorous and nitrogen rich source of

protein and food, well not protein. So you can take that loose DNA that's being released by a dying cell and actually use it. In soil systems especially clay soil systems soils with a lot of clay in the system, the dead DNA can actually stick to that clay and be held in perpetuity almost.

Sean Sanders: Hmm. Dr. Jack Gilbert: You know, it becomes very stable in that interaction. So we've been

trying to investigate how to understand that. One of the key principles is to link metagenomics with the metatranscriptomics to look at the RNA expressed by the system. RNA is heavily degraded very quickly. There's a lot of very active proteins called RNAses in any environment, any Ph.D. student working in the lab will know how quickly how RNA can disappear ‐‐

Sean Sanders: Right. Dr. Jack Gilbert: ‐‐ from your tube. So if you map the types of genes that are

expressed on to the types of genes that are actively found in the system using metagenomics, you can start to understand whether some of the DNA that you see in the metagenome might not be active.

Sean Sanders: Uh‐hum. Dr. Jack Gilbert: Now it doesn’t mean it's dead DNA, but it gives you a better example

of these, the ecological system biochemistry of the environment.

30

Dr. Karen Nelson: Related to that when you look at human samples, you know, ‐‐ Sean Sanders: Uh‐hum. Dr. Karen Nelson: ‐‐ there's a lot of free floating DNA in any sample you look at and it

gets collected when you do your pooling of DNA and it's interesting when you do human microbiome studies that the host DNA becomes a contaminant that you need to get rid of, right?

Sean Sanders: Uh‐hum. Right. Dr. Karen Nelson: So most of us think of the microbe as being the contaminant but all

of a sudden you have to address the host DNA that's associated with your sample then it can be as high as 80% or 90% like in the oral cavity. So it is ‐ you know, that's a very good question, but right now I think I agree with Jack that it's looking at what's expressed and going at that angle rather than trying to eliminate it.

Dr. Jun Wang: Yeah, absolutely. Dr. Karen Nelson: Yeah. Sean Sanders: Now what about low abundance microbes? How do you distinguish

those? How do you pull them out first of all and how do you distinguish it from essentially background noise? What we were just talking about.

Dr. Jack Gilbert: Well background noise in terms of sequencing ‐‐ Sean Sanders: Uh‐hum. Dr. Jack Gilbert: ‐‐ errors or PCR errors or the ‐‐ Sean Sanders: As well, yes. Dr. Jack Gilbert: ‐‐ or the many, many biases that compound. Generally, I think

statistically it's unlikely that a sequencing error or an amplification error will be found in high prevalent numbers in a sample. So the problem is deep sequencing is giving us the opportunity to look at this incredibly rare biosphere. So in the English Channel you've got a million reads that comprise 72‐time points, you find 20,000 species then you do one time point with 10 million reads and suddenly you find 100,000 species and the total diversity is in that 5%. Now, I'm

31

not saying that that other 95% is real. Some of it maybe errors, some of it may be problems, but we can do certain tricks, certain magic tricks, if you will, in bioinformatics to try and remove the possibility that they are errors. Removing singletons for example, looking at areas of chimeric disturbance, that's where you get two pieces of DNA stuck together.

[0:55:12] Sean Sanders: Uh‐hum. Dr. Jack Gilbert: We were talking the other day about other situations with regards to

that. Dr. Karen Nelson: Uh‐hum. Well, I think for the human side you just need to keep ‐‐ do

the experiment over and over and over again if you're going to be seeing that this is real, right?

Dr. Jun Wang: Yeah, sequence ‐‐ Dr. Karen Nelson: You need to go back to this sample ‐‐yeah. Dr. Jun Wang: Also sequencing most samples. Dr. Karen Nelson: Yeah, absolutely. Dr. Jun Wang: Absolutely. So you would see I mean one probable bacteria is really

less abundant in one individual but very abundant ‐‐ Dr. Karen Nelson: Yeah. Dr. Jun Wang: ‐‐ in another one and so on. Dr. Jack Gilbert: And you know, it's real. Dr. Karen Nelson: Yeah. Sean Sanders: Uh‐hum. So coming back to the error rate that you mentioned in

next‐gen sequencing technologies, what ‐‐ this viewer asks what is the confidence level with which one can annotate metagenomes say species level, taxonomic assignment, given this error rate? If that makes sense.

Dr. Jun Wang: Yes.

32

Dr. Karen Nelson: Yeah, it does. Sean Sanders: Okay. [Laughter] Dr. Jack Gilbert: For different platforms it's a different game. Sean Sanders: Okay. Dr. Jack Gilbert: I would say the longer read helps you to have a higher confidence

that that's the correct designation. But the shorter reads you have different issues. Certain longer read technologies that are coming out at the moment have very high error rates but very long fragments.

Sean Sanders: Uh‐hum. Dr. Jack Gilbert: Do you want to talk about PacBio? Dr. Jun Wang: Well I mean not particularly. Sean Sanders: [Laughs] Dr. Jun Wang: But I think the sequencing error is not that bad. Sean Sanders: Uh‐hum. Dr. Jun Wang: Especially when we're doing those large scale metagenomic

sequencing. Dr. Karen Nelson: Uh‐hum. Dr. Jun Wang: For those, you know, phylogenic analyses it’s fine. But sometimes if

it really goes into variation level like you want to understand one SNP of this particular strand in the gut for example, bacteria or whatever so it will be very, very important to do lots of validations afterwards.

Dr. Karen Nelson: Yeah. I agree. Dr. Jun Wang: But, you know, a species sample I think it should be absolutely fine. Sean Sanders: Hmm.

33

Dr. Karen Nelson: Just define the criteria you used for your study. Dr. Jack Gilbert: Yeah. Dr. Karen Nelson: And everybody will be happy. Sean Sanders: Okay. Dr. Karen Nelson: [Laughs] Dr. Jack Gilbert: Record it appropriately so ‐‐ Dr. Karen Nelson: Yes. Sean Sanders: Yeah. Dr. Jack Gilbert: ‐‐ the Genomics Standards Consortium is an ongoing effort to make

sure you record data appropriately. Dr. Karen Nelson: Yeah. Dr. Jack Gilbert: And that's the key thing for all of this is to make sure you have that

scientific lab book attached to that DNA that contains all the information about where it came from.

Dr. Karen Nelson: The metadata. Dr. Jack Gilbert: The metadata. Dr. Karen Nelson: Yeah. Sean Sanders: Right. Dr. Jack Gilbert: All the information about where it came from and how it arrived

there. That's very important. Sean Sanders: Now what about the application of metagenomics to the study of

antimicrobial resistance and so in other words taking this data and maybe translating it into actual clinical application? Anything there? Anyone like to take that, Dr. Wang?

Dr. Jun Wang: Well there's lots of hope there of course.

34

Sean Sanders: Okay. Dr. Jun Wang: But there's still a long way to go. Dr. Karen Nelson: Uh‐hum. Dr. Jun Wang: So first of all we have to find out the associated bacteria or

whatever, virus, but then you have to understand if it's the causal factor or it's just ‐‐

Dr. Karen Nelson: Uh‐hum. Dr. Jun Wang: ‐‐ you know, passengers in a way. Dr. Jack Gilbert: Uh‐hum. Dr. Jun Wang: But then after, you know, you have to do lots of animal models,

functional studies on that, but then you have to find out if probably phage or anything else could kill that ‐‐

Sean Sanders: Uh‐hum. Dr. Jun Wang: ‐‐ or things like that. So there's a long way to go. But it's very

promising. Sean Sanders: Uh‐hum. Dr. Nelson, anything else? Dr. Karen Nelson: Yeah. Well, I was just thinking about, you know, mining the human

microbiome and realizing that there are microbes that produce secondary compounds and not necessarily understanding how microbes speak to each other in their own environment.

Sean Sanders: Uh‐hum. Dr. Karen Nelson: And I think again we're just in the beginning phase, but we're going

to learn significantly new antimicrobials and new mechanisms about how microbes deal with each other in community settings so.

Sean Sanders: So we're just about out of time, but I want to throw one more

question out there to each one of you to get your input. Where do you see the field of metagenomics going in the next five to ten years, so sort of short to medium term and what would be your hopes for what you would like to see happen as far as technology and research advances? We'll start with Dr. Wang.

35

Dr. Jun Wang: Okay. So in five or ten years time scale, I really would hope there is

some of those clinical applications, I mean agricultural applications really into the field so people could sort of take the advantage and benefit out of it.

Sean Sanders: Uh‐hum. Dr. Jun Wang: And the technology itself could become very routine so analysis tools

would be developed for that. Dr. Karen Nelson: Yeah, I agree if I can go next. Sean Sanders: Sure. Dr. Karen Nelson: But I think if microbes are being used as non‐invasive biosignatures

or markers for a range of diseases and also to predict productivity on the agricultural setting I think, I'm hoping that's where we will be in the next five to ten years.

Sean Sanders: Okay. Dr. Jack Gilbert: Applied… full characterization of an ecosystem, systems biology

putting every tool we have available to the problem that's where it should be, total cohesive research coordination.

Dr. Karen Nelson: And we're all talking to each other. Dr. Jack Gilbert: Yes. Sean Sanders: Right. [Laughs] Dr. Karen Nelson: [Laughs] Sean Sanders: And of course the wine. Dr. Jack Gilbert: And the wine that would be – that’s key. [Laughter] Slide 53 Sean Sanders: Well unfortunately, we'll have to end there as we are out of time for

this webinar. On behalf of myself and our viewing audience, I want to thank our speakers for coming from near and far to be with us in

36

the studio and provide such engaging talks and interesting discussion, Dr. Jack Gilbert from the University of Chicago, Dr. Karen Nelson from the J. Craig Venter Institute, and Dr. Jun Wang from BGI.

[1:00:08] Many thanks to our online audience for the questions you

submitted. I'm sorry that we didn't have a chance to get to all of them. Please go to the URL now at the bottom of your slide viewer to learn more about resources related to today’s discussion, and look out for more webinars from Science available at www.sciencemag.org/webinar. This particular webinar will be made available to view again as an on‐demand presentation within approximately 48 hours from now.

We'd love to hear what you thought of the webinar, send us an

email at the address now up in your slide viewer: [email protected]. Again, thank you to our panel and to BGI for their kind sponsorship

of today’s seminar. Goodbye. [1:00:43] End of Audio

harnessing the power of metagenomics: …...rapid progress, particularly as next‐generation dna...

Documents