first gene identified - evidence of gene transfer from bacterial to protozoan pathogen this...

1
First Gene Identified - Evidence of gene transfer from bacterial to protozoan pathogen This represents the strongest evidence known to date of horizontal gene transfer between a bacteria and eukaryote. Neither nanA from the Pasteurellaceae Haemophilus influenzae, Pasteurella multocida, and Actinobacillus actinomycetemcomitans, nor nanA in T. vaginalis, have been investigated for their role in pathogenicity though some are well studied and are involved in a pathway implicated in virulence (3,4, 5). An Interdisciplinary Team Pathogenomics: An interdisciplinary approach for the study of infectious disease Fiona S. L. Brinkman 1,2 , Steven J. Jones 3 , Ivan Wan 3 , Yossef Av-Gay 4 , David L. Baillie 5 , Robert C. Brunham 6 , Stefanie Butland 7 , Rachel C. Fernandez 2 , B. Brett Finlay 2,8 , Hans Greberg 1 , Robert E.W. Hancock 2 , Christy Haywood-Farmer 9 , Patrick Keeling 10 , Audrey de Koning 9 , Don G. Moerman 9,11 , Sarah P. Otto 9 , B. Francis Ouellette 7 , Iain E. P. Taylor 10 , and Ann M. Rose 1 . 1 Dept of Medical Genetics, 2 Dept of Microbiology and Immunology, 4 Dept of Medicine, 8 Biotechnology Laboratory, 9 Dept of Zoology, 11 C. elegans Reverse Genetics Facility, 10 Dept of Botany, University of British Columbia, 5 Dept of Biological Sciences, Simon Fraser University, 7 Centre for Molecular Medicine and Therapeutics, 6 UBC Centre for Disease Control and 3 BC Genome Sequence Centre, Centre for Integrated Genomics, Vancouver, British Columbia, Canada. Project Summary A combination of informatics, evolutionary biology, microbiology and eukaryotic genetics is being exploited to identify pathogen genes which are more similar to host genes than expected, and likely to interact with, or mimic, their host’s gene functions. We are building a database of the sequences of these proteins, based on the increasing number of pathogen genomes which have been, or are currently being, sequenced. Candidate functions identified by our informatics approach will be tested in the laboratory (see flow chart) to investigate their role in pathogen infection and host interaction. All information will be eventually made available in a public Pathogenomics Database. PhyloBLAST – a program to aid analysis PhyloBLAST compares your protein sequence to a SWISSPROT/ TREMBL database using BLAST2 and then allows you to perform user-defined phylogenetic analyses based on user-selected proteins listed in the BLAST output. PhyloBLAST was initially developed as a tool specifically for this project, but is now available on the internet as a beta version at: www.pathogenomics.bc.ca/phyloBLAST Some Features - Organism information and phylogenetic distance measures are added to the BLAST output and subsequent phylogenetic trees - You may select sequences (in the list of BLAST hits) for further analysis, by simply clicking boxes next to each sequence of interest. Analyses vary from obtaining a FASTA file of the sequences, ClustalW alignment, or user-defined phylogenetic trees (currently based on PHYLIP programs). -All programs for tree construction are linked Rationale and Power of the Approach Genomics and bioinformatics provide powerful new tools for the study of pathogenicity, hence the initiation of a new field, Pathogenomics. Our approach is anchored in the fact that, as part of the infection process, many pathogens make use of host cellular processes. We hypothesize that some pathogen genes involved in such processes will be more similar to host genes than would be expected (based on phylogeny or motifs). We are attempting to identify such genes by applying specific bioinformatic and evolutionary analysis tools to sequenced genome datasets, and further examining such genes in the laboratory (both the pathogen gene and a homologous model host gene). We hypothesize that this approach will reveal new mechanisms of pathogen-host interaction. Power of the Approach •Enables better understanding of both the pathogen gene and homologous host/model host gene. •Provides insight into horizontal gene transfer events and the evolution of pathogenicity and pathogen-host interactions. •Interdisciplinary team fosters unique ideas and collaborations. •Automated approach can be continually updated. •Expression-independent method for identifying possible pathogenicity factors. •Public database of findings, to be developed, will enable other researchers to capitalize on the findings and promote further collaboration. Initial screen for candidate genes. Search pathogen proteins against sequence and motif databases. Are the results inconsistent with phylogeny (i.e. does the protein match more strongly the host, or its relatives, than expected?). Are there eukaryotic protein motifs in the pathogen protein? Filter out closely related bacteria from the search to identify eukaryotic hits to the pathogen proteins that may not have been previously detected. Rank candidates. Rank pathogen protein by how much more they resemble their host phyla than their own (e.g. BLAST score, phylogenetic distance score, tree building, unusual motifs, unusual codon usage). Prioritize for further biological study. Has the candidate pathogen gene or a eukaryotic homolog been previously studied biologically? (Prioritize unstudied genes) Is there a C. elegans homolog? (See below) Is the pathogen currently studied by the UBC pathogenomics bacterial group? If C. elegans homolog exists: target gene for knockout by knockout facility. Target for GFP fusion analysis to see when and where the gene is expressed in C. elegans Analysis of knockout through expression chip, and susceptibility to infection by pathogen. Database development. Create and maintain a database of pathogen-host interactions. Establish this as a platform for accelerating the study of pathogenicity and the identification of therapeutic drug targets. Iteratively refine the initial screening methods and candidate ranking. If pathogen being studied by UBC functional pathogenomics bacterial group: Examine subcellular localization and obtain a knockout of the gene. If pathogen is not a focus of UBC group: Contact other groups regarding results – instigate collaboration for further study. Analysis of knockout and gene through expression chip analysis and infectivity in an animal/tissue culture model, and C. elegans model if appropriate Continually exchange C. elegans gene information: with microbiologists studying homologous pathogen gene Continually exchange pathogen gene information with collaborators and with eukaryotic geneticists studying homologous gene in C. elegans Acknowledgements This project is funded by the Peter Wall Institute for Advanced Studies. Evolutionary significance. Manually inspect candidates. Are these valid cases of horizontal transfer, co-evolution or are they similar by chance? If horizontal transfer may be involved, when did this transfer occur? References 1. Doolittle, WF. 1998. Trends Genet. 14:307-311. 2. Read TD, Brunham RC, Shen C, Gill SR. et al. 2000. Nucleic Acids Res. 28:1397-1406. 3. Meysnick KC, Dimock K, Gerber GE. 1996. Mol. Biochem. Parasitol. 76:289-292. 4. Lilley GG, Barbosa JA, Pearce LA. 1998. Protein Expr. Purif. 12:295-304. 5. Muller HE, Mannheim W. 1995. Int. J. Med. Microbiol. Virol. Parasitol. Infect. Dis. 283:105-114. Bioinformatics BC Genome Sequence Centre Centre for Molecular Medicine and Therapeutics Pathogen Functions Dept. Microbiology Biotechnology Laboratory Dept. Medicine BC Centre for Disease Control Host Functions Dept. Medical Genetics C. elegans Reverse Genetics Facility Dept. Biological Sciences SFU Evolutionary Theory Dept of Zoology Dept of Botany • Canadian Institute for Advanced Research Coordinator The first gene identified was not a eukaryotic-like bacterial gene, but rather a bacterial-like eukaryotic gene. However, its possible role in pathogenicity make it of interest. N-acetylneuraminate lyase (NanA) is involved in sialic acid metabolism and is used by some bacteria to parasitize the mucous membranes of animals for nutritional purposes. NanA of the pathogenic Pasteurellaceae bacteria is 92- 95% similar to NanA of the eukaryotic pathogenic protozoan Trichomonas vaginalis. Trends in the Bioinformatic/Evolutionary Analysis •While our primary focus is to identify new genes or pathways involved in virulence, our approach has also identified the strongest cases of lateral gene transfer between bacteria and eukaryotes identified to date. We have also found that most cases of probable recent cross-domain gene transfer involve movement of a bacterial gene to a unicellular eukaryote. It has previously been proposed that such eukaryotes may obtain bacterial genes through ingestion of bacteria (the “you are what you eat” hypothesis; 1). •G+C analysis of genome ORFs, used to identify pathogenicity islands, revealed the following trend: Low variance of the mean G+C of ORFs for a given genome correlates with an intracellular lifestyle for the bacterium and a clonal nature (Two-tailed P value of 0.004, for a nonparametric correlation). This variance is similar within a given species. G+C variance may therefore be a useful marker for investigating the clonality of bacteria. Its relationship with intracellular lifestyle may reflect the ecological isolation of intracellular bacteria, as was previously proposed to explain the lack of chromosome rearrangement for Chlamydia species (2). •A control: Our method identifies all previously reported Chlamydia trachomatis eukaryotic-like genes. 0.1 Bacillus subtilis Escherichia coli Salmonella typhimurium Staphylococcua aureus Clostridium perfringens Clostridium difficile Trichomonas vaginalis Haemophilus influenzae Acinetobacillus actinomycetemcomitans Pasteurella multocida Neighbor-joining distance matrix tree of known and probable N- acetylneuraminate lyases, rooted by Bacillus subtilus dihydrodipicolinat e synthase. Example: Relationship between GMP reductase of E. coli and Metazoans An example of a eukaryotic-like bacterial gene in our database: 0.1 Rat Human Escherichia coli Caenorhabditis elegans Pig roundworm Methanococcus jannaschii Methanobacterium thermoautotrophicum Bacillus subtilis Streptococcus pyogenes Aquifex aeolicus Acinetobacter calcoaceticus Haemophilus influenzae Chlorobium vibrioforme Guanosine monophosphate reductase of E. coli is 81% similar to the corresponding enzyme studied in humans and rats, and shares a significant phylogenetic relationship with metazoans (left). A similar protein has been identified in other gamma subdivision proteobacteria including other enterobacteriaceae and Vibrio cholerae (from unfinished genome projects; not shown), suggesting a cross-domain gene transfer may have occurred before divergence of these gamma proteobacteria. Its role in virulence has not been investigated. Neighbor-joining tree of GMP reductases and related proteins. Blue=Bacteria, Red=Archaea, and Green=Eukarya www.pathogenomics.bc.ca

Upload: kathleen-atkinson

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: First Gene Identified - Evidence of gene transfer from bacterial to protozoan pathogen This represents the strongest evidence known to date of horizontal

First Gene Identified - Evidence of gene transfer from bacterial to protozoan pathogen

This represents the strongest evidence known to date of horizontal gene transfer between a bacteria and eukaryote.

Neither nanA from the Pasteurellaceae Haemophilus influenzae, Pasteurella multocida, and Actinobacillus actinomycetemcomitans, nor nanA in T. vaginalis, have been investigated for their role in pathogenicity though some are well studied and are involved in a pathway implicated in virulence (3,4, 5).

An Interdisciplinary Team

Pathogenomics: An interdisciplinary approach for the study of infectious diseaseFiona S. L. Brinkman 1,2, Steven J. Jones 3, Ivan Wan3, Yossef Av-Gay 4, David L. Baillie 5, Robert C. Brunham 6, Stefanie Butland 7, Rachel C.

Fernandez 2, B. Brett Finlay 2,8, Hans Greberg1, Robert E.W. Hancock 2, Christy Haywood-Farmer 9, Patrick Keeling 10, Audrey de Koning 9, Don G. Moerman 9,11, Sarah P. Otto 9, B. Francis Ouellette 7, Iain E. P. Taylor 10, and Ann M. Rose 1.

 

1 Dept of Medical Genetics, 2 Dept of Microbiology and Immunology, 4 Dept of Medicine, 8 Biotechnology Laboratory, 9 Dept of Zoology, 11 C. elegans Reverse Genetics Facility, 10 Dept of Botany, University of British Columbia, 5 Dept of Biological Sciences, Simon Fraser University, 7 Centre for Molecular Medicine and

Therapeutics, 6 UBC Centre for Disease Control and 3 BC Genome Sequence Centre, Centre for Integrated Genomics, Vancouver, British Columbia, Canada.

Project Summary

A combination of informatics, evolutionary biology, microbiology and eukaryotic genetics is being exploited to identify pathogen genes which are more similar to host genes than expected, and likely to interact with, or mimic, their host’s gene functions. We are building a database of the sequences of these proteins, based on the increasing number of pathogen genomes which have been, or are currently being, sequenced. Candidate functions identified by our informatics approach will be tested in the laboratory (see flow chart) to investigate their role in pathogen infection and host interaction. All information will be eventually made available in a public Pathogenomics Database.

PhyloBLAST – a program to aid analysis

PhyloBLAST compares your protein sequence to a SWISSPROT/ TREMBL database using BLAST2 and then allows you to perform user-defined phylogenetic analyses based on user-selected proteins listed in the BLAST output. PhyloBLAST was initially developed as a tool specifically for this project, but is now available on the internet as a beta version at: www.pathogenomics.bc.ca/phyloBLAST

Some Features

- Organism information and phylogenetic distance measures are added to the BLAST output and subsequent phylogenetic trees

- You may select sequences (in the list of BLAST hits) for further analysis, by simply clicking boxes next to each sequence of interest. Analyses vary from obtaining a FASTA file of the sequences, ClustalW alignment, or user-defined phylogenetic trees (currently based on PHYLIP programs).

-All programs for tree construction are linked together, for ease of use, but have full options for more expert use. Results may be obtained by email or the webpage.

Rationale and Power of the Approach

Genomics and bioinformatics provide powerful new tools for the study of pathogenicity, hence the initiation of a new field, Pathogenomics. Our approach is anchored in the fact that, as part of the infection process, many pathogens make use of host cellular processes. We hypothesize that some pathogen genes involved in such processes will be more similar to host genes than would be expected (based on phylogeny or motifs). We are attempting to identify such genes by applying specific bioinformatic and evolutionary analysis tools to sequenced genome datasets, and further examining such genes in the laboratory (both the pathogen gene and a homologous model host gene). We hypothesize that this approach will reveal new mechanisms of pathogen-host interaction.

Power of the Approach

•Enables better understanding of both the pathogen gene and homologous host/model host gene.

•Provides insight into horizontal gene transfer events and the evolution of pathogenicity and pathogen-host interactions.

•Interdisciplinary team fosters unique ideas and collaborations.

•Automated approach can be continually updated.

•Expression-independent method for identifying possible pathogenicity factors.

•Public database of findings, to be developed, will enable other researchers to capitalize on the findings and promote further collaboration.

Initial screen for candidate genes.Search pathogen proteins against sequence and motif databases. Are the results inconsistent with phylogeny (i.e. does the protein match more strongly the host, or its relatives, than expected?). Are there eukaryotic protein motifs in the pathogen protein? Filter out closely related bacteria from the search to identify eukaryotic hits to the pathogen proteins that may not have been previously detected.

Rank candidates.Rank pathogen protein by how much more they resemble their host phyla than their own (e.g. BLAST score, phylogenetic distance score, tree building, unusual motifs, unusual codon usage).

Prioritize for further biological study. Has the candidate pathogen gene or a eukaryotic homolog been previously studied biologically? (Prioritize unstudied genes) Is there a C. elegans homolog? (See below) Is the pathogen currently studied by the UBC pathogenomics bacterial group?

If C. elegans homolog exists: target gene for knockout by knockout facility.

Target for GFP fusion analysis to see when and where the gene is expressed in C. elegans

Analysis of knockout through expression chip, and susceptibility to infection by pathogen.

Database development. Create and maintain a database of pathogen-host interactions. Establish this as a platform for accelerating the study of pathogenicity and the identification of therapeutic drug targets.

Iteratively refine the initial screening methods and candidate ranking.

If pathogen being studied by UBC functional pathogenomics bacterial group: Examine subcellular localization and obtain a knockout of the gene.

If pathogen is not a focus of UBC group: Contact other groups regarding results – instigate collaboration for further study.

Analysis of knockout and gene through expression chip analysis and infectivity in an animal/tissue culture model, and C. elegans model if appropriate

Continually exchange C. elegans gene information: with microbiologists studying homologous pathogen gene

Continually exchange pathogen gene information with collaborators and with eukaryotic geneticists studying homologous gene in C. elegans

AcknowledgementsThis project is funded by the Peter Wall Institute for Advanced Studies.

Evolutionary significance.Manually inspect candidates. Are these valid cases of horizontal transfer, co-evolution or are they similar by chance? If horizontal transfer may be involved, when did this transfer occur?

References1. Doolittle, WF. 1998. Trends Genet.

14:307-311.

2. Read TD, Brunham RC, Shen C, Gill SR. et al. 2000. Nucleic Acids Res. 28:1397-1406.

3. Meysnick KC, Dimock K, Gerber GE. 1996. Mol. Biochem. Parasitol. 76:289-292.

4. Lilley GG, Barbosa JA, Pearce LA. 1998. Protein Expr. Purif. 12:295-304.

5. Muller HE, Mannheim W. 1995. Int. J. Med. Microbiol. Virol. Parasitol. Infect. Dis. 283:105-114.

Bioinformatics• BC Genome Sequence

Centre

• Centre for Molecular Medicine and Therapeutics

Pathogen Functions• Dept. Microbiology

• Biotechnology Laboratory

• Dept. Medicine

• BC Centre for Disease Control

Host Functions• Dept. Medical Genetics

• C. elegans Reverse Genetics Facility

• Dept. Biological Sciences SFU

Evolutionary Theory• Dept of Zoology

• Dept of Botany

• Canadian Institute for Advanced Research

Coordinator

The first gene identified was not a eukaryotic-like bacterial gene, but rather a bacterial-like eukaryotic gene. However, its possible role in pathogenicity make it of interest.

N-acetylneuraminate lyase (NanA) is involved in sialic acid metabolism and is used by some bacteria to parasitize the mucous membranes of animals for nutritional purposes.

NanA of the pathogenic Pasteurellaceae bacteria is 92-95% similar to NanA of the eukaryotic pathogenic protozoan Trichomonas vaginalis.

Trends in the Bioinformatic/Evolutionary Analysis•While our primary focus is to identify new genes or pathways involved in virulence, our approach has also identified the strongest cases of lateral gene transfer between bacteria and eukaryotes identified to date. We have also found that most cases of probable recent cross-domain gene transfer involve movement of a bacterial gene to a unicellular eukaryote. It has previously been proposed that such eukaryotes may obtain bacterial genes through ingestion of bacteria (the “you are what you eat” hypothesis; 1).

•G+C analysis of genome ORFs, used to identify pathogenicity islands, revealed the following trend: Low variance of the mean G+C of ORFs for a given genome correlates with an intracellular lifestyle for the bacterium and a clonal nature (Two-tailed P value of 0.004, for a nonparametric correlation). This variance is similar within a given species. G+C variance may therefore be a useful marker for investigating the clonality of bacteria. Its relationship with intracellular lifestyle may reflect the ecological isolation of intracellular bacteria, as was previously proposed to explain the lack of chromosome rearrangement for Chlamydia species (2).

•A control: Our method identifies all previously reported Chlamydia trachomatis eukaryotic-like genes.

0.1

Bacillus subtilis

Escherichia coli

Salmonella typhimurium

Staphylococcua aureus

Clostridium perfringens

Clostridium difficile

Trichomonas vaginalis

Haemophilus influenzae

Acinetobacillus actinomycetemcomitans

Pasteurella multocida

Neighbor-joining distance matrix tree of known and probable N-acetylneuraminate lyases, rooted by Bacillus subtilus dihydrodipicolinate

synthase.

Example: Relationship between GMP reductase of E. coli and Metazoans

An example of a eukaryotic-like bacterial gene in our database:

0.1

Rat

Human

Escherichia coli

Caenorhabditis elegans

Pig roundworm

Methanococcus jannaschii

Methanobacterium thermoautotrophicum

Bacillus subtilis

Streptococcus pyogenes

Aquifex aeolicus

Acinetobacter calcoaceticus

Haemophilus influenzae

Chlorobium vibrioforme

Guanosine monophosphate reductase of E. coli is 81% similar to the corresponding enzyme studied in humans and rats, and shares a significant phylogenetic relationship with metazoans (left). A similar protein has been identified in other gamma subdivision proteobacteria including other enterobacteriaceae and Vibrio cholerae (from unfinished genome projects; not shown), suggesting a cross-domain gene transfer may have occurred before divergence of these gamma proteobacteria. Its role in virulence has not been investigated.

Neighbor-joining tree of GMP reductases and related proteins. Blue=Bacteria, Red=Archaea,

and Green=Eukarya

www.pathogenomics.bc.ca