a comparative genomics, network-based approach to ... · comparative genomics has been used to...

11
JOURNAL OF BACTERIOLOGY, Oct. 2009, p. 6262–6272 Vol. 191, No. 20 0021-9193/09/$08.000 doi:10.1128/JB.00475-09 Copyright © 2009, American Society for Microbiology. All Rights Reserved. A Comparative Genomics, Network-Based Approach to Understanding Virulence in Vibrio cholerae Jianying Gu, 1 ‡* Yufeng Wang, 2,3 ‡ and Timothy Lilburn 4 * Department of Biology, City University of New York, Staten Island, New York 10314 1 ; Department of Biology, University of Texas at San Antonio, San Antonio, Texas 78249 2 ; South Texas Center for Emerging Infectious Diseases, University of Texas at San Antonio, San Antonio, Texas 78249 3 ; and Department of Bacteriology, American Type Culture Collection, Manassas, Virginia 20110 4 Received 7 April 2009/Accepted 31 July 2009 Our views of the genes that drive phenotypes have generally been built up one locus or operon at a time. However, a given phenotype, such as virulence, is a multilocus phenomenon. To gain a more comprehensive view of the genes and interactions underlying a phenotype, we propose an approach that incorporates infor- mation from comparative genomics and network biology and illustrate it by examining the virulence phenotype of Vibrio cholerae O1 El Tor N16961. We assessed the associations among the virulence-associated proteins from Vibrio cholerae and all the other proteins from this bacterium using a functional-association network map. In the context of this map, we were able to identify 262 proteins that are functionally linked to the virulence- associated genes more closely than is typical of the proteins in this strain and 240 proteins that are functionally linked to the virulence-associated proteins with a confidence score greater than 0.9. The roles of these genes were investigated using functional information from online data sources, comparative genomics, and the relationships shown by the protein association map. We also incorporated core proteome data from the family Vibrionaceae; 35% of the virulence-associated proteins have orthologs among the 1,822 orthologous groups of proteins in the core proteome, indicating that they may be dual-role virulence genes or encode functions that have value outside the human host. This approach is a valuable tool in searching for novel functional associations and in investigating the relationship between genotype and phenotype. The advent of high-throughput approaches to biology has forced us to rethink the way we parse the components that make up an organism, leading us away from the perceived primacy of the gene and its encoded product to a new view that encompasses how the gene product interacts with other gene products in a given set of circumstances (19). One example of how this new viewpoint is changing our understanding is found in the field of pathogenesis, specifically, in how we understand virulence (47). Virulence has been defined as the ability of a pathogen to damage a host. Virulence is mediated by virulence factors, the means by which a pathogen establishes and main- tains an infection and by which it ensures its transmission to another host. Virulence factors have been classified as ad- hesins, invasins, impedins, aggresins, and modulins, but these factors are rarely the products of a single locus in the patho- gen—groups of interacting loci are responsible for the activi- ties of the virulence factors. Virulence can be thought of as an emergent property of the multiple interactions that manifest as a phenotype, and this implies that any attempt to define a virulence factor must take into account these interactions, or network properties, of virulence. Hence, definitions of individ- ual loci as virulence factors have moved away from one-gene- one-factor definitions, and loci implicated in virulence are now classified in a way that attempts to reflect their roles. Was- senaar and Gaastra (71), for example, conceptualize three tiers of virulence factors. The top-level virulence factors are termed true virulence factors and include aggressins, like the bacterial toxins. The second-level virulence factors are termed viru- lence-associated factors and include supporting factors, such as invasins, that are required for the activities of the true viru- lence factors. Finally, there are factors that, while required for the establishment and maintenance of the pathogen in the host, are not exclusively expressed in that environment. They are termed virulence lifestyle genes and include, for example, adhesins, like fimbriae. As the repertoire of loci implicated in virulence grows and becomes more nuanced, the identification of all the loci involved in the manifestation of the phenotype becomes an important task. Because virulence is a multilocus phenomenon, not all the loci that work together to manifest a “virulence factor” may be recognized, despite their essential roles in virulence. Methods for detecting and identifying the loci in virulence systems are required, especially methods that can account for the uncertainties involved in identifying mem- bers of these systems. Here, we present an approach to iden- tifying potential members of virulence systems by using com- parative genomics and functional-association networks, as applied to the environmental pathogen Vibrio cholerae. V. cholerae is widely known as the causative agent of cholera, but of the roughly 200 serotypes that are ubiquitously distrib- uted in the world’s oceans, only two, O1 and O139, have been consistently linked to epidemic cholera (15). Surveys of V. cholerae strains found in the environment show that the toxi- * Corresponding author. Mailing address for Jianying Gu: Biology Department, 65-126, 2800 Victory Blvd., College of Staten Island/ CUNY, Staten Island, NY. Phone: (718) 982-4123. E-mail: guj@mail .csi.cuny.edu. Mailing address for Timothy Lilburn: Bacteriology, ATCC, 10801 University Boulevard, Manassas, VA. Phone: (703) 365- 2700. Fax: (703) 334-2931. E-mail: [email protected]. † Supplemental material for this article may be found at http://jb .asm.org. ‡ J.G. and Y.W. contributed equally to this work. Published ahead of print on 7 August 2009. 6262 on December 10, 2020 by guest http://jb.asm.org/ Downloaded from

Upload: others

Post on 23-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

JOURNAL OF BACTERIOLOGY, Oct. 2009, p. 6262–6272 Vol. 191, No. 200021-9193/09/$08.00�0 doi:10.1128/JB.00475-09Copyright © 2009, American Society for Microbiology. All Rights Reserved.

A Comparative Genomics, Network-Based Approach to UnderstandingVirulence in Vibrio cholerae�†

Jianying Gu,1‡* Yufeng Wang,2,3‡ and Timothy Lilburn4*Department of Biology, City University of New York, Staten Island, New York 103141; Department of Biology, University of

Texas at San Antonio, San Antonio, Texas 782492; South Texas Center for Emerging Infectious Diseases, University ofTexas at San Antonio, San Antonio, Texas 782493; and Department of Bacteriology, American Type Culture Collection,

Manassas, Virginia 201104

Received 7 April 2009/Accepted 31 July 2009

Our views of the genes that drive phenotypes have generally been built up one locus or operon at a time.However, a given phenotype, such as virulence, is a multilocus phenomenon. To gain a more comprehensiveview of the genes and interactions underlying a phenotype, we propose an approach that incorporates infor-mation from comparative genomics and network biology and illustrate it by examining the virulence phenotypeof Vibrio cholerae O1 El Tor N16961. We assessed the associations among the virulence-associated proteinsfrom Vibrio cholerae and all the other proteins from this bacterium using a functional-association network map.In the context of this map, we were able to identify 262 proteins that are functionally linked to the virulence-associated genes more closely than is typical of the proteins in this strain and 240 proteins that are functionallylinked to the virulence-associated proteins with a confidence score greater than 0.9. The roles of these geneswere investigated using functional information from online data sources, comparative genomics, and therelationships shown by the protein association map. We also incorporated core proteome data from the familyVibrionaceae; 35% of the virulence-associated proteins have orthologs among the 1,822 orthologous groups ofproteins in the core proteome, indicating that they may be dual-role virulence genes or encode functions thathave value outside the human host. This approach is a valuable tool in searching for novel functionalassociations and in investigating the relationship between genotype and phenotype.

The advent of high-throughput approaches to biology hasforced us to rethink the way we parse the components thatmake up an organism, leading us away from the perceivedprimacy of the gene and its encoded product to a new view thatencompasses how the gene product interacts with other geneproducts in a given set of circumstances (19). One example ofhow this new viewpoint is changing our understanding is foundin the field of pathogenesis, specifically, in how we understandvirulence (47). Virulence has been defined as the ability of apathogen to damage a host. Virulence is mediated by virulencefactors, the means by which a pathogen establishes and main-tains an infection and by which it ensures its transmission toanother host. Virulence factors have been classified as ad-hesins, invasins, impedins, aggresins, and modulins, but thesefactors are rarely the products of a single locus in the patho-gen—groups of interacting loci are responsible for the activi-ties of the virulence factors. Virulence can be thought of as anemergent property of the multiple interactions that manifest asa phenotype, and this implies that any attempt to define avirulence factor must take into account these interactions, ornetwork properties, of virulence. Hence, definitions of individ-

ual loci as virulence factors have moved away from one-gene-one-factor definitions, and loci implicated in virulence are nowclassified in a way that attempts to reflect their roles. Was-senaar and Gaastra (71), for example, conceptualize three tiersof virulence factors. The top-level virulence factors are termedtrue virulence factors and include aggressins, like the bacterialtoxins. The second-level virulence factors are termed viru-lence-associated factors and include supporting factors, such asinvasins, that are required for the activities of the true viru-lence factors. Finally, there are factors that, while required forthe establishment and maintenance of the pathogen in thehost, are not exclusively expressed in that environment. Theyare termed virulence lifestyle genes and include, for example,adhesins, like fimbriae. As the repertoire of loci implicated invirulence grows and becomes more nuanced, the identificationof all the loci involved in the manifestation of the phenotypebecomes an important task. Because virulence is a multilocusphenomenon, not all the loci that work together to manifest a“virulence factor” may be recognized, despite their essentialroles in virulence. Methods for detecting and identifying theloci in virulence systems are required, especially methods thatcan account for the uncertainties involved in identifying mem-bers of these systems. Here, we present an approach to iden-tifying potential members of virulence systems by using com-parative genomics and functional-association networks, asapplied to the environmental pathogen Vibrio cholerae.

V. cholerae is widely known as the causative agent of cholera,but of the roughly 200 serotypes that are ubiquitously distrib-uted in the world’s oceans, only two, O1 and O139, have beenconsistently linked to epidemic cholera (15). Surveys of V.cholerae strains found in the environment show that the toxi-

* Corresponding author. Mailing address for Jianying Gu: BiologyDepartment, 65-126, 2800 Victory Blvd., College of Staten Island/CUNY, Staten Island, NY. Phone: (718) 982-4123. E-mail: [email protected]. Mailing address for Timothy Lilburn: Bacteriology,ATCC, 10801 University Boulevard, Manassas, VA. Phone: (703) 365-2700. Fax: (703) 334-2931. E-mail: [email protected].

† Supplemental material for this article may be found at http://jb.asm.org.

‡ J.G. and Y.W. contributed equally to this work.� Published ahead of print on 7 August 2009.

6262

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 2: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

genic strains comprise a small fraction (0.8%) of the V. chol-erae strains that can be detected (16). Cholera is thought to killover 100,000 people every year, although, due to the economicpenalties imposed on countries where a cholera epidemic oc-curs, the disease is probably underreported. V. cholerae can befound in radically different environments, including the pelagicoceans, the human intestinal tract, biofilms adhering to plank-ton and shellfish, and the trophozoites of amoebas (1, 62).Movement among these niches demands a high degree of phys-iological flexibility.

Toxigenic strains of V. cholerae have two main virulencefactors: the cholera toxin, encoded by the CTX loci that arefound on a prophage, CTXphi, and the toxin-coregulated pilus(TCP), found on a pathogenicity island, VPI-1. Other virulencedeterminants are found on a second pathogenicity island,VPI-2; on the mannose-sensitive hemagglutination pilus loci;on the RTX toxin cluster; and on the RS1phi prophage. V.cholerae El Tor strains also carry two unique genomic islandstermed the Vibrio seventh pandemic islands I and II (46). ManyV. cholerae isolates in the aquatic environment carry some partof the virulence-related gene complement of the O1 and O139serotypes; Rahman et al. found that 3.9% of non-O1 non-O139isolates from surface waters in the Dhaka district of Bang-ladesh carried both the CTX genes and the TCP genes. Afurther 3.9% carried one or the other of these so-called majorvirulence determinants, and all of these strains carried at leastone virulence-associated gene or group of genes (50). Such apool of strains in the environment makes the mixing andmatching of virulence-related loci from V. cholerae possible inthe environment (17). In order to better understand virulence,it is necessary to complete our list of virulence-related genesand to address how they might interact with each other.

Comparative genomics has been used to identify virulencefactors in newly sequenced genomes from pathogens, and thisapproach can also be used to aid in the classification of theidentified virulence factors. The set of genomes available forthe Vibrionaceae includes genomes from two nonpathogenicspecies of Vibrionaceae, Aliivibrio (Vibrio) fischeri and Photo-bacterium profundum, as well as genomes from strains thathave hosts and modes of pathogenicity that differ from those ofV. cholerae. By establishing the set of genes found in all thegenomes of all the members of the family Vibrionaceae (thepangenome) and the subset of genes shared by each member ofthe family (known as the core genome), we can identify viru-lence-related genes that are shared among pathogenic andnonpathogenic strains. The proteins encoded by these sharedgenes (which we refer to here as the core proteome) can betentatively classified as virulence lifestyle-related proteins that,in V. cholerae, support pathogenicity, while V. cholerae viru-lence-associated proteins that are encoded by the genes in thepangenome (which we refer to here as the panproteome) maybe more directly involved in virulence. Not all virulence life-style proteins found in V. cholerae are found in the core pro-teome, but the establishment of the core proteome and pan-proteome still serves as a significant filter for identifying truevirulence proteins.

As more and more high-throughput data, derived fromgenomic, transcriptome, and proteomic investigations, accu-mulate, it has become possible to infer the interactions amongthe proteins in a bacterium and to build, for example, func-

tional-association networks (27). When tested against sets ofknown interacting proteins, these networks have proven toperform very well in identifying novel associations among pro-teins (31, 51), and this reliability has increased as the amountof data used to build the networks has increased. The onlineservice Search Tool for the Retrieval of Interacting Genes/Proteins (the STRING database) uses a standard format forrepresenting the network of associations: nodes represent pro-teins, and edges, the lines between the nodes, represent func-tional associations. The number of edges connecting a node toother nodes is known as the degree of that node. Nodes of highdegree also tend to be essential proteins (6, 28, 76). Thiscorrelation makes sense intuitively, and an analysis of protein-protein interaction networks in Saccharomyces cerevisiaeshowed that the correlation is due to the tendency of essentialproteins to form densely connected subnetworks with proteinsthat are functionally involved in the same biological process(77). By analyzing a functional-association network for V. chol-erae, it ought to be possible to tease out more information onrecognized virulence proteins and to identify other proteinsthat may be important in virulence. Using the virulence pro-teins as in silico bait proteins, we can extract the subset ofproteins linked to them, a set of proteins that will be enrichedfor those with unrecognized roles in virulence. Examination ofthis subnetwork can teach us more about how pathogenesis ismanifested in V. cholerae, as well as about the proteins respon-sible.

MATERIALS AND METHODS

Sequence data and annotation. Genome sequences and primary annotationsfor V. cholerae O1 biovar El Tor strain N16961 (21) and O1 biovar classical strainO395 (NC_009456 and NC_009457), Vibrio parahaemolyticus RIMD 2210633(43), Vibrio vulnificus CMCP6 (32), V. vulnificus YJ016 (9), Vibrio harveyi ATCCBAA-1116 (NC_009777, NC_009783, and NC_009784), Vibrio splendidus LGP32(NC_011744 and NC_011753), A. (Vibrio) fischeri ES114 (54), A. fischeri MJ11(NC_011184, NC_011185, and NC_011186), A. salmonicida LFI1238 (22), and P.profundum SS9 (68) were from the GenBank and J. Craig Venter Institute’sComprehensive Microbial Resource (48). Additional annotation, primarily in-formation about other identifiers associated with each locus, was retrieved fromthe Database for Annotation, Visualization, and Integrated Discovery, a com-prehensive bioinformatics resource (24). The expanded list of identifiers madegetting comprehensive information easier. We obtained information about theenzymatic activities encoded by V. cholerae from the Braunschweig EnzymeDatabase (5), which specializes in comprehensive annotation of enzymes and hasa more complete set of links between enzyme commission numbers, the associ-ated functional information, and the loci of V. cholerae than other sources.Enzyme commission numbers serve as links to metabolic-pathway information inMetaCyc (8). General information about protein function and about the domainsof encoded proteins was obtained from UniProt (67) and, via UniProt, fromInterpro (45). Gene ontology annotation, used in functional-enrichment analy-ses, was also obtained from UniProt. Information on signal transduction proteinswas found at the Microbial Signal Transduction database (66).

Assembly of a list of known virulence-related proteins. We initially used a listof 165 V. cholerae virulence-related proteins from the Virulence Factor Database(VFDB) (http://zdsys.chgb.org.cn/VFs/) (74). The first results using this set ofgenes and the network-based method included many proteins that had alreadybeen recognized as virulence related. The literature about these first hits led usto expand our list of virulence proteins, and we realized our list would have to bemore comprehensive if we hoped to identify novel virulence-related proteins. Wetherefore made a more systematic search for papers containing informationabout V. cholerae and virulence using PubMed and GoPubMed (14). We alsolooked for more database sources and were directed to the National MicrobialPathogen Data Resource (NMPDR) (http://www.nmpdr.org/FIG/wiki/view.cgi)(44), which annotates the genome of V. cholerae and other pathogens in auniform way that includes tagging virulence proteins. The addition of theNMPDR list to the results of our literature search and the VFDB list gave a total

VOL. 191, 2009 VIRULENCE AND FUNCTIONAL-ASSOCIATION NETWORKS 6263

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 3: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

of 525 proteins. Our literature sources are shown in the supplemental material.The list of 525 proteins is shown in Table S1 in the supplemental material, andthe proteins are linked to the supplemental references.

Distribution of virulence-related proteins. In order to supply an evolutionarycontext and as an aid to a preliminary classification of the virulence-relatedproteins, we defined their distribution among the 11 strains of Vibrionaceae. Weused OrthoMCL (37) to detect and group the orthologous proteins in the 11strains of Vibrionaceae. The program builds the list of orthologs by doing anall-against-all blastp search. The orthologs are clustered using the Markov clusteralgorithm, working off a matrix of corrected P values. From the results, we wereable to identify two set of proteins: those that were encoded by all 11 strains, thatis, the core proteome of the Vibrionaceae, and those that were encoded by fewerthan 11 strains, that is, the panproteome. A hierarchical functional classificationof the proteins that fell into OrthoMCL groups was performed by searchingagainst the Clusters of Orthologous Groups (COG) database (61).

Protein association data. The V. cholerae functional-association data wereobtained from the STRING database version 7.1 (27). The associations amongthe proteins in the data set were visualized using Cytoscape 2.6 (57). Statistics onthe connectivity in the network were calculated using the NetworkAnalyzerplug-in for Cytoscape 2.6 (4). Gene ontology term enrichment of subsets ofproteins was estimated with the BiNGO plug-in (42) for Cytoscape, using thehypergeometric test and the Benjamini and Hochberg false discovery rate cor-rection, with a selected significance level of 0.05. Functional-association candi-dates were assessed using information from the Database for Annotation, Visu-alization, and Integrated Discovery; the NMPDR; the UCSC Archaeal Genomebrowser (56); expression data; and other information from the published litera-ture.

RESULTS AND DISCUSSION

We first identified and classified a list of virulence-relatedproteins from V. cholerae by database and literature searchesand by establishing the core proteome and panproteome forthe family Vibrionaceae. We then mapped these virulence pro-teins onto a functional-association network based on data fromSTRING v7.1. Statistical methods were used to establish con-nections between the proposed virulence-related proteins andother proteins in V. cholerae in order to identify novel candi-date virulence proteins and the systems in which they werefound.

Virulence-related proteins in V. cholerae. In identifyingknown virulence-related proteins in V. cholerae, we relied onliterature searches and on two databases that list virulenceproteins from V. cholerae O1 El Tor N16961, the VFDB (74)and the NMPDR (44). Together, these two databases con-tained 337 virulence-related proteins; only 79 of these proteinswere found in both databases. We added a further 189 proteinsto this list after extensive literature searching (Fig. 1). In car-rying out the literature search, we looked for any protein orgene that was linked to virulence, including the level 3 locicalled virulence lifestyle genes. Thus, our search results in-cluded any proteins that were known to enhance the ability ofthe bacterium to invade and colonize the gastrointestinal tractor to express true virulence factors. Our list also contains anumber of hypothetical or uncharacterized proteins that wereincluded, for example, because they were part of a pathoge-nicity island. We feel justified in using a liberal definition of avirulence-related protein for two reasons. First, the proposedvirulence protein is to be viewed in the context of other pro-teins in the cell, as this will shed light on the function of theprotein and aid in judging whether it really is a virulenceprotein. Second, inclusiveness is important in that even tenu-ous connections can point us to other, unrecognized virulenceproteins and help to fill out the component lists of virulence-

related subsystems. The proteins are listed in Table S1 in thesupplemental material.

Classifying the virulence-related proteins. Our comparisonof the gene complement of 11 strains from the Vibrionaceaeestablished a core proteome composed of 1,882 proteins. Asexpected, this is significantly smaller than the single-speciescore genome of 2,741 genes established for V. cholerae N16961by Keymer et al. (30). However, for a family level core pro-teome, it is remarkably large, even relative to genus-level coregenomes for other taxa. In the genus Streptococcus, for exam-ple, Lefebure and Stanhope estimated the core genome to be611 genes (36).

Roughly 49% of all the proteins in V. cholerae are part of thecore proteome. Just over one-third (35%) of the V. choleraeN16961 proteins that have been classified as virulence or vir-ulence-associated proteins are also found in the core pro-teome, that is, they are found in the avirulent strains and instrains that have other modes of infection. Thus, while it isprobable that they carry out functions that are required for theestablishment of an infection, these virulence-related proteinsare not expressed solely in the human host environment. In-deed, it is known that certain virulence proteins can confer anadvantage on V. cholerae outside the host, and these have beentermed dual-role colonization factors (53, 69). For example,the TCP appears to play a role in the colonization of chitinoussurfaces in the environment, as well as in colonization of thehuman gut (52), and a chitin binding protein, needed for thecolonization of copepods, shrimp, and other chitinous surfacesin the environment, also aids in the colonization of epithelialcells (33). Thus, it is quite possible that some genes that arefound in the core proteome and appear to be virulence lifestylegenes may have unrecognized roles as virulence-associatedgenes or even encode true virulence factors.

We classified the orthologous proteins according to theirrelationships with the COGs at NCBI. The size of each of thefunctional groups found in the V. cholerae N16961 proteome isshown in Fig. 2 (top), along with its degree distribution. Thenoncore proteins (bottom) (classified by NCBI) and the pro-teins that we were unable to place in a COG group (class X)have lower connectivity within the network. Proteins in COG

FIG. 1. Visualization of the sources of the virulence-related pro-teins in our list.

6264 GU ET AL. J. BACTERIOL.

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 4: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

classes J (translation, ribosomal structure, and biogenesis) andD (cell cycle control, cell division, and chromosome partition-ing) had the highest mean degrees, significantly higher thanthose of other groups. They were followed by F (nucleotidetransport and metabolism) and L (replication, recombination,and repair). This is not unexpected, given that many of theseproteins are essential to the survival of the cell. The proteins inCOG classes P (inorganic-ion transport and metabolism) andK (transcription) had the lowest mean degrees. This might beexplained by the differences in how these sets of proteins in-teract with other proteins. They do not form large complexes,like the ribosome; indeed, many may be parts of specializedmetabolic pathways in which they interact with a few otherproteins via common substrates.

The functional-association network. Figure 3 shows a func-tional-association network for V. cholerae N16961, built usingdata from the STRING database (27). The network includes3,756 proteins and 159,497 associations; each association be-tween a pair of proteins has a confidence score (S) rangingfrom 0.15 to 0.999 that was inferred from the evidence used toestablish the association. The noncore proteins in V. choleraeand the other nodes, which are members of the core protein setfor 11 strains from the family Vibrionaceae, are shown in Fig. 3.

Of the 525 virulence-related proteins in our database, 3 werenot connected to any other proteins in the STRING database(S � 0.15) and 2 more were not connected to any other protein(S � 0.4).

The appearance of noncore loci at the periphery of thefunctional association network, shown in Fig. 3, indicates thaton average the noncore proteins are less highly connected thanthe core proteins. When we looked at the degree distributionfor these two groups, this indeed proved to be the case; thecore proteins had, on average, a higher degree of connectivitywithin the network, and there is good statistical support for theobserved difference (Fig. 3, inset). The core protein set in-cludes many proteins essential to the viability of the cell, pro-teins involved in central metabolism, translation, transcription,and so on. The difference in connectivity between the twogroups cannot be attributed to differences in depth of annota-tion, as even classifiable noncore proteins show lower connec-tivity (Fig. 2, bottom); furthermore, 715 (38%) of the coreproteins found in V. cholerae N16961 are annotated inGenBank as hypothetical, putative, or probable. We assert thatthe lower degree of association observed in the noncore pro-teins is due, at least in part, to the peripheral roles some ofthem play in the cell.

FIG. 2. Box plots summarizing the degrees of proteins in the functional-association data set for V. cholerae N16961 as a function of their COGclasses, when only associations where S is �0.4 are considered. (Top) Subset of proteins that are part of the core proteome. (Bottom) Subset ofproteins that are not part of the core proteome. The proteins in the bottom panel were classified by NCBI. Each color and letter represents a COGclass, with the exception of X (proteins that were members of an OrthoMCL group but that were not placed into a COG class) and Z (proteinsthat were not members of an orthologous group and could not be placed into a COG class). The plots show the range of degree values for eachgroup of proteins; the median value is represented by the bar at the center of the notch.

VOL. 191, 2009 VIRULENCE AND FUNCTIONAL-ASSOCIATION NETWORKS 6265

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 5: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

Three subsets of the association data set were extractedusing scripts: (i) the set of proteins that are associated with anS of �0.4, which included 3,734 proteins and 36,769 associa-tions (99% of the proteins and 23% of the edges in the maindata set); (ii) the set of proteins that are associated with at leastone virulence-related protein at any confidence score (3,193proteins [85%] and 33,914 associations [21%]), which con-tained 523 virulence-related proteins; and (iii) the set of pro-teins that are associated with at least one virulence relatedprotein at an S of �0.4 (2,220 proteins [59%] and 7,183 asso-ciations [5%]), which contained 521 virulence proteins. Use ofthe third set eliminated the edges among the non-virulence-related proteins, which made it easier to clarify which non-virulence-related proteins were associated with more than onevirulence-related protein. Figure S1 in the supplemental ma-terial shows this virulence association subnetwork. There are1,699 other V. cholerae proteins linked to the virulence pro-teins in this subset, and of these, 66% are in the core proteinset. Several clearly defined clusters can be seen in this repre-sentation (see Fig. S1 in the supplemental material). The mosthighly populated cluster is primarily made up of chemotaxis-and motility-related proteins. This subnetwork is shown in Fig.4. Since motility is an important factor for successful coloni-zation of the intestine, nearly all the flagellar components and

control elements are shown as virulence proteins. Of course,the flagellar proteins are also required for survival outside thehuman host. A recent report by Liu et al. indicated that theflagellar protein encoded by flgM has a more direct role invirulence; the flagella are shed when the bacterium invades theintestinal mucosa, and the cell detects the FlgM proteins,which initiates a regulatory chain that derepresses virulencegene expression (40). The chemotaxis receptor/transducer pro-teins are also part of this cluster, but as discussed below, theirroles are not always related to chemotaxis. Many of thesereceptor/transducer proteins are not members of the core pro-tein set and are not associated with the motility operon, per-haps reflecting the diverse requirements for such proteins forbacteria growing in different environments (see below).

The associations illustrated in Fig. 3 and 4 can provide in-formation on cellular systems involved in virulence. Further-more, if we consider that the function of a protein has beenobserved to be related to the functions of the proteins withwhich it interacts, it should be possible to identify previouslyunrecognized virulence-related proteins by analyzing the asso-ciations included in Fig. 3 (and simplified in Fig. 4). Here, wediscuss how association analysis can be used to narrow thesearch area for new virulence determinants and to help under-stand the roles of the implicated gene products in the cell.

FIG. 3. Visualization of the functional-association network of V. cholerae N16961 based on data from the STRING database. The nodes thatare part of the core proteome are colored according to their COG classifications, as shown below the network. Nodes that are not part of the coreproteome are colored pink. The square nodes are virulence-associated proteins. There are over 159,000 associations proposed among the 3,756proteins in the map, and those associations with an S of �0.9 (from 0.15 to 0.899) are shown in gray. The heavy blue edges represent associationswith an S of �0.9. The inset shows the distribution of degrees (connectivities) for the core and noncore proteins as box plots. The plots show therange of degree values for each group of proteins; the median value is represented by the bar at the center of the notch.

6266 GU ET AL. J. BACTERIOL.

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 6: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

Connecting candidates to virulence-associated proteins. (i)Disproportionately connected proteins. We first discuss thecategory of non-virulence-related proteins that, while not nec-essarily associated with any one virulence-related protein at ahigh confidence level, have a disproportionately large numberof associations with virulence-related proteins. We calculatedthe ratio of associations with virulence-related proteins to as-sociations with non-virulence-related proteins for each proteinin the proteome. Figure 5 shows box plots of the ratios for thevirulence-associated proteins in our database and for the non-virulence-associated proteins. Less than 8% of the non-viru-lence-associated proteins had a proportion of their associa-tions with virulence-associated proteins that was greater thanor equal to the median value for the virulence-associated pro-teins (0.25). These are the outliers in Fig. 5. It is reasonable tocharacterize these proteins as disproportionately connected,relative to the bulk of the proteins, and to propose that thisdisproportionately connected set contains proteins that playroles in the virulence of V. cholerae. There are 240 proteins in

the set, and they constitute about 14% of the proteins thatinteract directly with the virulence-related proteins (see Fig. S1in the supplemental material); 62 of them (26%) are coreproteome proteins. These disproportionately connected pro-teins are listed in Table S2 in the supplemental material.

Of the disproportionately connected proteins, 141 have afunctional assignment from the genome annotation and 123have gene ontology (GO) terms relating to biological processesassociated with them. In all, 50 of the proteins have no func-tional information whatsoever linked to them.

The 123 disproportionately connected proteins that haveGO biological-process terms associated with them are signifi-cantly enriched for the term “signal transduction.” The expres-sion of virulence-related genes is driven by environmental cues,and these signals must be transmitted to the cell if it is toflourish in a new environment. Thirty-seven of these proteinsare annotated as methyl-accepting chemotaxis proteins(MCPs). There are 45 such proteins in V. cholerae (comparedto 5 in Escherichia coli) (66), and 8 of them are in our list ofvirulence-related proteins. The paradigmatic role of these che-moreceptors is in chemotaxis and motility, where they act inconcert with the che genes to control bacterial movement to-ward or away from concentrations of extracellular molecules(7). E. coli has only one set of che genes, and they are essentialfor chemotaxis. In V. cholerae, three che operons are found, butonly one of them is essential for chemotaxis (70). The motility-related operon is located in the region of cheY and cheZ, loci

FIG. 4. Motility/chemotaxis subnetwork of associations (S � 0.4)from Fig. 3 that involves some of the 525 virulence proteins in V.cholerae. The symbol shapes are as in Fig. 3. This cluster contains manyMCPs that are not recognized as virulence-related proteins but thatare associated with the nonmotility che operons, which have beenlinked to virulence in V. cholerae.

FIG. 5. Box plot of the proportion of connections from each pro-tein that are to virulence-associated proteins for two sets of proteins:Vir, the virulence-associated proteins, and Not, the proteins that arenot identified as virulence associated. The plots show the range ofproportions for each set of proteins; the median value is representedby the bar at the center of the notch.

VOL. 191, 2009 VIRULENCE AND FUNCTIONAL-ASSOCIATION NETWORKS 6267

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 7: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

that encode the CheY (VC2065) and CheZ (VC2064) pro-teins. There is only one CheZ homolog in V. cholerae, andalthough there are four homologs of the soluble response reg-ulator CheY in the V. cholerae genome, VC2065 is the onlyCheY homolog with a convincing FliM-binding motif, indicat-ing that it is the only CheY homolog involved in chemotaxis.This implies that the other two operons, with which the bulk ofthe MCPs are associated, mediate bacterial responses otherthan chemotaxis (25). Figure S2 in the supplemental materialshows the associations among all the MCPs and the variousCheA, -W, -Z, and -Y proteins. Only 3 of the 45 MCPs in V.cholerae N16961 are associated with Che proteins from themotility operon. Two of these three MCPs (VC0098 andVCA1092) are classified as 36H MCPs, similar to the E. colichemotaxis/motility MCPs (66). In fact, these are the only two36H MCPs in the V. cholerae genome, and both carry theC-terminal pentapeptide motif, unique to the Proteobacteria,which is thought to aid in binding of the CheB and CheRproteins, two proteins that play key roles in the chemotacticadaptation response (3, 72). The third associated MCP,VCA1088, remains unclassified. The chromosomal locationsand structural classifications of the MCPs in E. coli and V.cholerae are shown in Fig. S3 in the supplemental material,along with the locations of the various che genes. The MCPs inV. cholerae are much more structurally diverse than those seenin E. coli. They may have one, two, or no transmembranedomains and have HAMP input modules, as is seen in E. coli,or may have Cache, PAS, or no recognized input modules. Thestructural diversity of the MCPs in V. cholerae (2) also supportsthe notion that the remaining 42 MCPs in V. cholerae could beinvolved in the regulation of other processes.

Eight of the 45 MCPs are already classified as virulence-related genes, as they are known to be (i) involved with theTCP (VC0825 and VC0840) (see Fig. S3 in the supplementalmaterial), (ii) implicated in the expression of a hemolysin(VCA0220), (iii) encoded on the second Vibrio-specific patho-genesis island (VC0512 and VC0514) (see Fig. S3 in the sup-plemental material), or (iv) expressed only during infection ofa human host (VC0216, VCA1056, and VCA0176) (20). Fourother MCPs that are not currently classified as virulence-re-lated proteins, VC0449, VC1403, VCA0906, and VCA1034,are associated with recognized virulence-related proteins (S �0.4). VC0449 is associated with two phage-related replicationproteins, RstA1 (VC1454) and RstA2 (VC1463), and is knownto be induced by N-acetylglucosamine, the chitin polymer sub-unit. This is notable because chitin induces competence in V.cholerae and has been implicated in the transfer of the CTXphiprophage among toxigenic strains (65), suggesting a role forthe VC0449 MCP in regulating this process. Another MCP inthis group, VCA1034, also appears to be involved in chitin-induced regulation. VCA1034 is cotranscribed with, andthought to interact with, an extracellular N-acetylglucosaminebinding protein, VCA1033. It is also linked to the vibriobactinouter membrane binding protein (VC2211), the RTX toxin(VC1451), and a CheY-like response regulator (VCA1086). Athird MCP in this group, VCA0906, is associated with HutZ(VCA0907). Finally, the fourth member of this group, VC1403,is associated with a single virulence-related protein, VC1817.This protein is annotated as a sigma-54-dependent transcrip-tional regulator. Such proteins regulate the expression of genes

whose promoters are specifically recognized by the sigma-54subunit of RNA polymerase. The set of genes regulated byVC1817 is unknown, but genes transcribed by the sigma-54RNA polymerase include iron uptake-related genes, the im-munogenic protein VCA0144, and genes required for motility(60). Elimination of the sigma-54 subunit results in attenuationof virulence in V. cholerae, and this attenuation is not entirelydue to the loss of motility (29, 34).

Of the 117 disproportionately associated proteins with noGO biological-process annotation, 86 are annotated as “hypo-thetical proteins,” and 74 of these have no GO annotation atall. Some of these 74 proteins are candidates for functionalassignment. For example, VC2735 is encoded upstream of theeps operon and is thought to be cotranscribed with VC2736 aspart of an operon that is divergently transcribed from the epsoperon (49, 56). The eps operon plays a central role in viru-lence. It encodes the type II secretion system (T2SS) (and not,as implied by the genome annotation, the general secretionpathway [13]), a set of proteins that facilitates the export of thecholera toxin and a hemagglutinin/protease protein and that isalso involved in the secretion of the filamentous phage thatencodes the cholera toxin (12, 55). This diversity of substratesis an unusual feature of the V. cholerae T2SS (11). Figure 6shows the proteins that interact with VC2735 and the chromo-somal arrangement of the genes around it. As shown in Fig. 6,the divergently transcribed eps genes encode proteins thatform a tight cluster. With the exception of VC2733, all theproteins in this cluster have orthologs in the other 11 genomes.The location of the gene encoding VC2735 is often occupied bygenes that modulate some aspect of protein secretion in otherspecies (18), but it is not a homolog of any of these proteins.VC2735 has an S4 RNA-binding motif that indicates it mayplay a role in translational regulation. The gene downstream ofVC2735 in the putative operon encodes a redox-sensitive chap-erone that is similarly disproportionately connected to viru-lence proteins. Chaperones similar to VC2736 are activated inresponse to oxidative stress and elevated temperature; they arevery efficient chaperones (26). Chaperones are also commonlyrequired to aid in the translocation and assembly of secretedproteins. We speculate that VC2735 and VC2736 are involvedin translational regulation and protein stabilization under thechanging conditions faced by V. cholerae, possibly when thecells enter the human host. Under these conditions, the role ofthe T2SS changes from involvement in filamentous phage pro-duction to secretion of cholera toxin. Presumably, deletion ofthese two genes would lead to a decrease in the production ofcholera toxin under infective conditions.

(ii) Proteins with high-confidence associations. One verypromising group of potential virulence-related proteins is com-posed of non-virulence-related proteins that are associated ata high confidence score (S � 0.9) with recognized virulenceproteins. We extracted a list of these proteins from the datarepresented in Fig. 3, and the 262 non-virulence-related pro-teins in this set are listed in Table S3 in the supplementalmaterial. Associations among the virulence-related proteinsand this set are seen in Fig. S1 in the supplemental material,especially near the virulence proteins that form modules in-volved in iron uptake, chemotaxis, pilin formation, and so on.Unlike the set of disproportionately connected proteins, mostof these proteins (77%) are members of the core proteome.

6268 GU ET AL. J. BACTERIOL.

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 8: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

There are 28 proteins that are annotated as “hypotheticalproteins” in this set. Two hundred eleven of the proteins haveGO biological-process annotations. In contrast to the group ofdisproportionately connected non-virulence-related proteins,which are enriched for only a single GO term, this group isenriched for a diversity of GO terms, including “glycolysis,”“NAD biosynthetic process,” and “serine biosynthetic pro-cess”; overall, the enriched terms fall into the “metabolic pro-cess” category, which is significantly overrepresented in thisgroup of loci. This probably reflects links between metabolismand virulence; sugar transport has been linked to the regula-tion of biofilm formation in V. cholerae (23), and the ability tosynthesize 2,3-butanediol has been credited with the ability ofEl Tor strains to survive the acidity of the human stomach,thereby enhancing the virulence of these strains (75). Research

looking at in vivo gene expression profiles also indicates thatthere are strong regulatory links between metabolism and vir-ulence (41, 73). Recently, such a regulatory connection hasbeen elucidated in group A Streptococcus strains (58, 59).

Some of the proteins in this set are clearly examples ofvirulence-related proteins that have been overlooked due todifferences in annotation. For example, the genes encodingVC0244 and VC0247 are part of the operon made up of genesneeded to synthesize the O antigen component of the lipopoly-saccharide but were not included in our list of virulence-re-lated proteins because, unlike the other genes in this operon,they were not designated rfb genes in the annotation.

Other candidate virulence-related proteins detected includethree proteins that are members of a putative six-gene operonfound on chromosome 2 (Fig. 7). One of these proteins,

FIG. 6. Virulence-associated proteins that interact with VC2735. The edge styles are as in Fig. 3. The gene neighborhood is also shown, withputative transcriptional terminators indicated as stylized stems and loops. VC2733 is shown in red, as its functionality is unclear. It encodes theEpsD protein, an outer membrane protein that forms a channel through which secreted proteins and phage reach the exterior of the cell. This gene,in V. cholerae O1 biovar El Tor strain N16961, has a frameshift mutation near the N terminus and has therefore been annotated as nonfunctional.Nonetheless, the strain is able to excrete the cholera toxin and to produce viable filamentous CTXphi, so either the encoded protein is functionalor a viable substitute exists and the frameshift mutation does not disrupt expression of the eps operon genes.

VOL. 191, 2009 VIRULENCE AND FUNCTIONAL-ASSOCIATION NETWORKS 6269

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 9: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

VCA1084, is annotated as a toxin secretion ATP-binding pro-tein. The second, VCA1082, has GGDEF and EAL domains.GGDEF/EAL proteins modulate levels of bis-(3�-5�)-cyclic di-GMP (c-di-GMP) (10). This compound is a second messengerimplicated in the regulation of biofilm formation (63), motility(38), and virulence (39, 64) in V. cholerae. GGDEF domainsare involved in the synthesis of c-di-GMP, while EAL domainsencode phosphodiesterase activity, which breaks down c-di-GMP. The importance of c-di-GMP in the regulation of V.cholerae is underscored by the presence of 62 GGDEF and/orEAL domains in the proteome. VCA1083, which is encoded onthis putative operon but not associated with high confidencewith any virulence proteins, also has GGDEF and EAL do-mains. Interestingly, in other strains of V. cholerae, VCA1082and VCA1083 appear to be fused into a single protein (44).The third protein is a predicted periplasmic protein of un-known function (35). The fifth protein encoded in the operon,VCA1080, is on our list of virulence-related proteins. NMPDRassigned it to its virulence protein collection on the basis of itshomology with ABC-type protease exporter proteins in othertaxa. This type I secretion protein has been designated a pu-tative RTX transport protein in other species of Vibrio (35).

Figure 7 reveals that these three proteins are linked to oneanother, as well as to other proteins involved in virulence,including VC1447, an RTX transporter protein; VC0398,which is encoded by the first gene in the msh operon, and isanother GGDEF/EAL protein; and VC1622, a putative outermembrane protein that has an OmpA protein domain.

Conclusion. By assembling a list of virulence-related pro-teins for V. cholerae N16961 and using these proteins as insilico bait proteins in a computationally generated functional-association network, we were able to generate a list of 463proteins that are candidates for roles in virulence systems inthe pathogen. This list includes proteins that are obviouslyinvolved in virulence but that were overlooked because of theannotation, as well as proteins that require follow-up to con-firm their roles in virulence. This group of candidate proteinswas significantly enriched for proteins involved in chemotaxis,cell communication, and signal transduction and, to a lesserdegree, for proteins involved in the regulation of cellular pro-cesses and a variety of metabolic processes. This is consistentwith the notion that virulence depends on the actions of a largenumber of proteins, many of which control the pathogen’sbehavior, rather than on a few proteins acting directly on the

FIG. 7. Proteins that interact with VCA1081, VCA1082, and VCA1084. Edge styles and gene neighborhood conventions are as in Fig. 6.

6270 GU ET AL. J. BACTERIOL.

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 10: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

host. The associations shown in Fig. 3 and 4 hint that evolu-tionarily driven changes in any one protein must have far-reaching effects and suggest that studying the evolution ofsystems will aid greatly in understanding how pathogenesisemerges.

ACKNOWLEDGMENTS

This work is supported by NIH grant 1R21AI067543 to T. G. Lil-burn and Y. Wang, NIH grants SC1GM081068 and SC1AI080579 toY. Wang, and the PSC-CUNY Research Award PSCREG-39-497 andCUNY Summer Research Award to J. Gu.

The content is solely the responsibility of the authors and does notnecessarily represent the official views of the National Institute ofGeneral Medical Sciences, National Institute of Allergy and InfectiousDiseases; the National Institutes of Health; or ATCC.

REFERENCES

1. Abd, H., A. Saeed, A. Weintraub, G. B. Nair, and G. Sandstrom. 2007. Vibriocholerae O1 strains are facultative intracellular bacteria, able to survive andmultiply symbiotically inside the aquatic free-living amoeba Acanthamoebacastellanii. FEMS Microbiol. Ecol. 60:33–39.

2. Alexander, R. P., and I. B. Zhulin. 2007. Evolutionary genomics revealsconserved structural determinants of signaling and adaptation in microbialchemoreceptors. Proc. Natl. Acad. Sci. USA 104:2885–2890.

3. Alon, U., M. G. Surette, N. Barkai, and S. Leibler. 1999. Robustness inbacterial chemotaxis. Nature 397:168–171.

4. Assenov, Y., F. Ramírez, S. E. Schelhorn, T. Lengauer, and M. Albrecht.2008. Computing topological parameters of biological networks. Bioinfor-matics 24:282–284.

5. Barthelmes, J., C. Ebeling, A. Chang, I. Schomburg, and D. Schomburg.2007. BRENDA, AMENDA and FRENDA: the enzyme information systemin 2007. Nucleic Acids Res. 35:D511–D514.

6. Batada, N. N., and L. D. Hurst. 2007. Evolution of chromosome organizationdriven by selection for reduced gene expression noise. Nat. Genet. 39:945–949.

7. Butler, S. M., and A. Camilli. 2005. Going against the grain: chemotaxis andinfection in Vibrio cholerae. Nat. Rev. Microbiol. 3:611–620.

8. Caspi, R., H. Foerster, C. A. Fulcher, P. Kaipa, M. Krummenacker, M.Latendresse, S. Paley, S. Y. Rhee, A. G. Shearer, C. Tissier, T. C. Walk, P.Zhang, and P. D. Karp. 2008. The MetaCyc database of metabolic pathwaysand enzymes and the BioCyc collection of pathway/genome databases. Nu-cleic Acids Res. 36:D623–D631.

9. Chen, C. Y., K. M. Wu, Y. C. Chang, C. H. Chang, H. C. Tsai, T. L. Liao,Y. M. Liu, H. J. Chen, A. B. Shen, J. C. Li, T. L. Su, C. P. Shao, C. T. Lee,L. I. Hor, and S. F. Tsai. 2003. Comparative genome analysis of Vibriovulnificus, a marine pathogen. Genome Res. 13:2577–2587.

10. Cotter, P. A., and S. Stibitz. 2007. c-di-GMP-mediated regulation of viru-lence and biofilm formation. Curr. Opin. Microbiol. 10:17–23.

11. Davis, B. M., E. H. Lawson, M. Sandkvist, A. Ali, S. Sozhamannan, andM. K. Waldor. 2000. Convergence of the secretory pathways for cholera toxinand the filamentous phage, CTXphi. Science 288:333–335.

12. Davis, B. M., and M. K. Waldor. 2000. CTXphi contains a hybrid genomederived from tandemly integrated elements. Proc. Natl. Acad. Sci. USA97:8572–8577.

13. Desvaux, M., N. J. Parham, A. Scott-Tucker, and I. R. Henderson. 2004. Thegeneral secretory pathway: a general misnomer? Trends Microbiol. 12:306–309.

14. Doms, A., and M. Schroeder. 2005. GoPubMed: exploring PubMed with theGene Ontology. Nucleic Acids Res. 33:W783–W786.

15. Farmer, J. J., J. M. Janda, F. W. Brenner, D. N. Cameron, and K. M.Birkhead. 2005. Genus I. Vibrio Pacini 1854, 411AL, p. 494–546. In D. J.Brenner, N. R. Krieg, and J. T. Staley (ed.), Bergey’s manual of systematicbacteriology, vol. 2, part B. Springer, New York, NY.

16. Faruque, S. M., N. Chowdhury, M. Kamruzzaman, M. Dziejman, M. H.Rahman, D. A. Sack, G. B. Nair, and J. J. Mekalanos. 2004. Genetic diversityand virulence potential of environmental Vibrio cholerae population in acholera-endemic area. Proc. Natl. Acad. Sci. USA 101:2123–2128.

17. Faruque, S. M., and G. B. Nair. 2002. Molecular ecology of toxigenic Vibriocholerae. Microbiol. Immunol. 46:59–66.

18. Filloux, A. 2004. The underlying mechanisms of type II protein secretion.Biochim. Biophys. Acta 1694:163–179.

19. Fraser, A. G., and E. M. Marcotte. 2004. A probabilistic view of genefunction. Nat. Genet. 36:559–564.

20. Hang, L., M. John, M. Asaduzzaman, E. A. Bridges, C. Vanderspurt, T. J.Kirn, R. K. Taylor, J. D. Hillman, A. Progulske-Fox, M. Handfield, E. T.Ryan, and S. B. Calderwood. 2003. Use of in vivo-induced antigen technol-ogy (IVIAT) to identify genes uniquely expressed during human infectionwith Vibrio cholerae. Proc. Natl. Acad. Sci. USA 100:8508–8513.

21. Heidelberg, J. F., J. A. Eisen, W. C. Nelson, R. A. Clayton, M. L. Gwinn, R. J.Dodson, D. H. Haft, E. K. Hickey, J. D. Peterson, L. Umayam, S. R. Gill,K. E. Nelson, T. D. Read, H. Tettelin, D. Richardson, M. D. Ermolaeva, J.Vamathevan, S. Bass, H. Qin, I. Dragoi, P. Sellers, L. McDonald, T. Utter-back, R. D. Fleishmann, W. C. Nierman, O. White, S. L. Salzberg, H. O.Smith, R. R. Colwell, J. J. Mekalanos, J. C. Venter, and C. M. Fraser. 2000.DNA sequence of both chromosomes of the cholera pathogen Vibrio chol-erae. Nature 406:477–483.

22. Hjerde, E., M. S. Lorentzen, M. T. Holden, K. Seeger, S. Paulsen, N. C.Bason, C. Churcher, D. Harris, H. Norbertczak, M. A. Quail, S. Sanders, S.Thurston, J. Parkhill, N. P. Willassen, and N. Thomson. 2008. The genomesequence of the fish pathogen Aliivibrio salmonicida strain LFI1238 showsextensive evidence of gene decay. BMC Genomics 9:616.

23. Houot, L., and P. I. Watnick. 2008. A novel role for enzyme I of the Vibriocholerae phosphoenolpyruvate phosphotransferase system in regulation ofgrowth in a biofilm. J. Bacteriol. 190:311–320.

24. Huang, D. W., B. Sherman, Q. Tan, J. Collins, W. G. Alvord, J. Roayaei, R.Stephens, M. Baseler, H. C. Lane, and R. Lempicki. 2007. DAVID genefunctional classification tool: a novel biological module-centric algorithm tofunctionally analyze large gene lists. Genome Biol. 8:R183.

25. Hyakutake, A., M. Homma, M. J. Austin, M. A. Boin, C. C. Hase, and I.Kawagishi. 2005. Only one of the five CheY homologs in Vibrio choleraedirectly switches flagellar rotation. J. Bacteriol. 187:8403–8410.

26. Jakob, U., W. Muse, M. Eser, and J. C. Bardwell. 1999. Chaperone activitywith a redox switch. Cell 96:341–352.

27. Jensen, L. J., M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T.Doerks, P. Julien, A. Roth, M. Simonovic, P. Bork, and C. von Mering. 2009.STRING 8—a global view on proteins and their functional interactions in630 organisms. Nucleic Acids Res. 37:D412–D416.

28. Jeong, H., S. P. Mason, A. L. Barabasi, and Z. N. Oltvai. 2001. Lethality andcentrality in protein networks. Nature 411:41–42.

29. Kazmierczak, M. J., M. Wiedmann, and K. J. Boor. 2005. Alternative sigmafactors and their roles in bacterial virulence. Microbiol. Mol. Biol. Rev.69:527–543.

30. Keymer, D. P., M. C. Miller, G. K. Schoolnik, and A. B. Boehm. 2007.Genomic and phenotypic diversity of coastal Vibrio cholerae strains is linkedto environmental factors. Appl. Environ. Microbiol. 73:3705–3714.

31. Kim, S. M., P. M. Bowers, D. Pal, M. Strong, T. C. Terwilliger, M. Kauf-mann, and D. Eisenberg. 2007. Functional linkages can reveal protein com-plexes for structure determination. Structure 15:1079–1089.

32. Kim, Y. R., S. E. Lee, C. M. Kim, S. Y. Kim, E. K. Shin, D. H. Shin, S. S.Chung, H. E. Choy, A. Progulske-Fox, J. D. Hillman, M. Handfield, and J. H.Rhee. 2003. Characterization and pathogenic significance of Vibrio vulnificusantigens preferentially expressed in septicemic patients. Infect. Immun. 71:5461–5471.

33. Kirn, T. J., B. A. Jude, and R. K. Taylor. 2005. A colonization factor linksVibrio cholerae environmental survival and human infection. Nature 438:863–866.

34. Klose, K. E., and J. J. Mekalanos. 1998. Distinct roles of an alternative sigmafactor during both free-swimming and colonizing phases of the Vibrio chol-erae pathogenic cycle. Mol. Microbiol. 28:501–520.

35. Krishnamurthy, N., D. P. Brown, D. Kirshner, and K. Sjolander. 2006.PhyloFacts: an online structural phylogenomic encyclopedia for proteinfunctional and structural classification. Genome Biol. 7:R83.

36. Lefebure, T., and M. J. Stanhope. 2007. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genomecomposition. Genome Biol. 8:R71.

37. Li, L., C. J. Stoeckert, Jr., and D. S. Roos. 2003. OrthoMCL: identificationof ortholog groups for eukaryotic genomes. Genome Res. 13:2178–2189.

38. Lim, B., S. Beyhan, J. Meir, and F. H. Yildiz. 2006. Cyclic-diGMP signaltransduction systems in Vibrio cholerae: modulation of rugosity and biofilmformation. Mol. Microbiol. 60:331–348.

39. Lim, B., S. Beyhan, and F. H. Yildiz. 2007. Regulation of Vibrio polysaccha-ride synthesis and virulence factor production by CdgC, a GGDEF-EALdomain protein, in Vibrio cholerae. J. Bacteriol. 189:717–729.

40. Liu, Z., T. Miyashiro, A. Tsou, A. Hsiao, M. Goulian, and J. Zhu. 2008.Mucosal penetration primes Vibrio cholerae for host colonization by repress-ing quorum sensing. Proc. Natl. Acad. Sci. USA 105:9769–9774.

41. Lombardo, M., J. Michalski, H. F. Martinez-Wilson, C. Morin, T. Hilton, C.Osorio, J. Nataro, C. Tacket, A. Camilli, and J. Kaper. 2007. An in vivoexpression technology screen for Vibrio cholerae genes expressed in humanvolunteers. Proc. Natl. Acad. Sci. USA 104:18229–18234.

42. Maere, S., K. Heymans, and M. Kuiper. 2005. BiNGO: a Cytoscape pluginto assess overrepresentation of gene ontology categories in biological net-works. Bioinformatics 21:3448–3449.

43. Makino, K., K. Oshima, K. Kurokawa, K. Yokoyama, T. Uda, K. Tagomori,Y. Iijima, M. Najima, M. Nakano, A. Yamashita, Y. Kubota, S. Kimura, T.Yasunaga, T. Honda, H. Shinagawa, M. Hattori, and T. Iida. 2003. Genomesequence of Vibrio parahaemolyticus: a pathogenic mechanism distinct fromthat of V. cholerae. Lancet 361:743–749.

44. McNeil, L. K., C. Reich, R. K. Aziz, D. Bartels, M. Cohoon, T. Disz, R. A.Edwards, S. Gerdes, K. Hwang, M. Kubal, G. R. Margaryan, F. Meyer, W.

VOL. 191, 2009 VIRULENCE AND FUNCTIONAL-ASSOCIATION NETWORKS 6271

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from

Page 11: A Comparative Genomics, Network-Based Approach to ... · Comparative genomics has been used to identify virulence factors in newly sequenced genomes from pathogens, and this approach

Mihalo, G. J. Olsen, R. Olson, A. Osterman, D. Paarmann, T. Paczian, B.Parrello, G. D. Pusch, D. A. Rodionov, X. Shi, O. Vassieva, V. Vonstein, O.Zagnitko, F. Xia, J. Zinner, R. Overbeek, and R. Stevens. 2007. The NationalMicrobial Pathogen Database Resource (NMPDR): a genomics platformbased on subsystem annotation. Nucleic Acids Res. 35:D347–D353.

45. Mulder, N. J., R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns,P. Bork, V. Buillard, L. Cerutti, R. Copley, E. Courcelle, U. Das, L. Daugh-erty, M. Dibley, R. Finn, W. Fleischmann, J. Gough, D. Haft, N. Hulo, S.Hunter, D. Kahn, A. Kanapin, A. Kejariwal, A. Labarga, P. S. Langendijk-Genevaux, D. Lonsdale, R. Lopez, I. Letunic, M. Madera, J. Maslen, C.McAnulla, J. McDowall, J. Mistry, A. Mitchell, A. N. Nikolskaya, S. Or-chard, C. Orengo, R. Petryszak, J. D. Selengut, C. J. A. Sigrist, P. D.Thomas, F. Valentin, D. Wilson, C. H. Wu, and C. Yeats. 2007. New devel-opments in the InterPro database. Nucleic Acids Res. 35:D224–D228.

46. O’Shea, Y. A., F. J. Reen, A. M. Quirke, and E. F. Boyd. 2004. Evolutionarygenetic analysis of the emergence of epidemic Vibrio cholerae isolates on thebasis of comparative nucleotide sequence analysis and multilocus virulencegene profiles. J. Clin. Microbiol. 42:4657–4671.

47. Pallen, M. J., and B. W. Wren. 2007. Bacterial pathogenomics. Nature449:835–842.

48. Peterson, J. D., L. A. Umayam, T. Dickinson, E. K. Hickey, and O. White.2001. The Comprehensive Microbial Resource. Nucleic Acids Res. 29:123–125.

49. Price, M., K. H. Huang, E. Alm, and A. Arkin. 2005. A novel method foraccurate operon predictions in all sequenced prokaryotes. Nucleic AcidsRes. 33:880–892.

50. Rahman, M., K. Biswas, M. A. Hossain, R. B. Sack, J. Mekalanos, and S. M.Faruque. 2008. Distribution of genes for virulence and ecological fitnessamong diverse Vibrio cholerae population in a cholera endemic area: trackingthe evolution of pathogenic strains. DNA Cell Biol. 27:347–355.

51. Rajagopala, S. V., J. Goll, N. D. Gowda, K. C. Sunil, B. Titz, A. Mukherjee,S. S. Mary, N. Raviswaran, C. S. Poojari, S. Ramachandra, S. Shtivelband,S. M. Blazie, J. Hofmann, and P. Uetz. 2008. MPI-LIT: a literature-curateddataset of microbial binary protein-protein interactions. Bioinformatics 24:2622–2627.

52. Reguera, G., and R. Kolter. 2005. Virulence and the environment: a novelrole for Vibrio cholerae toxin-coregulated pili in biofilm formation on chitin.J. Bacteriol. 187:3551–3555.

53. Reidl, J., and K. E. Klose. 2002. Vibrio cholerae and cholera: out of the waterand into the host. FEMS Microbiol. Rev. 26:125–139.

54. Ruby, E. G., M. Urbanowski, J. Campbell, A. Dunn, M. Faini, R. Gunsalus,P. Lostroh, C. Lupp, J. McCann, D. Millikan, A. Schaefer, E. Stabb, A.Stevens, K. Visick, C. Whistler, and E. P. Greenberg. 2005. Complete ge-nome sequence of Vibrio fischeri: a symbiotic bacterium with pathogeniccongeners. Proc. Natl. Acad. Sci. USA 102:3004–3009.

55. Sandkvist, M., L. O. Michel, L. P. Hough, V. M. Morales, M. Bagdasarian,M. Koomey, V. J. DiRita, and M. Bagdasarian. 1997. General secretionpathway (eps) genes required for toxin secretion and outer membrane bio-genesis in Vibrio cholerae. J. Bacteriol. 179:6994–7003.

56. Schneider, K. L., K. S. Pollard, R. Baertsch, A. Pohl, and T. M. Lowe. 2006.The UCSC Archaeal Genome Browser. Nucleic Acids Res. 34:D407–D410.

57. Shannon, P., A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N.Amin, B. Schwikowski, and T. Ideker. 2003. Cytoscape: a software environ-ment for integrated models of biomolecular interaction networks. GenomeRes. 13:2498–2504.

58. Shelburne, S., M. Davenport, D. Keith, and J. Musser. 2008. The role ofcomplex carbohydrate catabolism in the pathogenesis of invasive strepto-cocci. Trends Microbiol. 16:318–325.

59. Shelburne, S. A., D. Keith, N. Horstmann, P. Sumby, M. T. Davenport, E. A.Graviss, R. G. Brennan, and J. M. Musser. 2008. A direct link betweencarbohydrate utilization and virulence in the major human pathogen groupA Streptococcus. Proc. Natl. Acad. Sci. USA 105:1698–1703.

60. Studholme, D. J., and R. Dixon. 2003. Domain architectures of �54-depen-dent transcriptional activators. J. Bacteriol. 185:1757–1767.

61. Tatusov, R. L., N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin, E. V.Koonin, D. M. Krylov, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya,B. S. Rao, S. Smirnov, A. V. Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, andD. A. Natale. 2003. The COG database: an updated version includes eu-karyotes. BMC Bioinformatics 4:41.

62. Thompson, F. L., T. Iida, and J. Swings. 2004. Biodiversity of vibrios. Mi-crobiol. Mol. Biol. Rev. 68:403–431.

63. Tischler, A. D., and A. Camilli. 2004. Cyclic diguanylate (c-di-GMP) regu-lates Vibrio cholerae biofilm formation. Mol. Microbiol. 53:857–869.

64. Tischler, A. D., and A. Camilli. 2005. Cyclic diguanylate regulates Vibriocholerae virulence gene expression. Infect. Immun. 73:5873–5882.

65. Udden, S. M., M. S. Zahid, K. Biswas, Q. S. Ahmad, A. Cravioto, G. B. Nair,J. J. Mekalanos, and S. M. Faruque. 2008. Acquisition of classical CTXprophage from Vibrio cholerae O141 by El Tor strains aided by lytic phagesand chitin-induced competence. Proc. Natl. Acad. Sci. USA 105:11951–11956.

66. Ulrich, L. E., and I. B. Zhulin. 2007. MiST: a microbial signal transductiondatabase. Nucleic Acids Res. 35:D386–D390.

67. UniProt Consortium. 2008. The universal protein resource (UniProt). Nu-cleic Acids Res. 36:D190–D195.

68. Vezzi, A., S. Campanaro, M. D’Angelo, F. Simonato, N. Vitulo, F. M. Lauro,A. Cestaro, G. Malacrida, B. Simionati, N. Cannata, C. Romualdi, D. H.Bartlett, and G. Valle. 2005. Life at depth: Photobacterium profundum ge-nome sequence and expression analysis. Science 307:1459–1461.

69. Vezzulli, L., C. A. Guzman, R. Colwell, and C. Pruzzo. 2008. Dual rolecolonization factors connecting Vibrio cholerae’s lifestyles in human andaquatic environments open new perspectives for combating infectious dis-eases. Curr. Opin. Biotechnol. 19:254–259.

70. Wadhams, G. H., and J. P. Armitage. 2004. Making sense of it all: bacterialchemotaxis. Nat. Rev. Mol. Cell Biol. 5:1024–1037.

71. Wassenaar, T. M., and W. Gaastra. 2001. Bacterial virulence: can we drawthe line? FEMS Microbiol. Lett. 201:1–7.

72. Wu, J., J. Li, G. Li, D. G. Long, and R. M. Weis. 1996. The receptor bindingsite for the methyltransferase of bacterial chemotaxis is distinct from the sitesof methylation. Biochemistry 35:4984–4993.

73. Xu, Q., M. Dziejman, and J. J. Mekalanos. 2003. Determination of thetranscriptome of Vibrio cholerae during intraintestinal growth and midexpo-nential phase in vitro. Proc. Natl. Acad. Sci. USA 100:1286–1291.

74. Yang, J., L. Chen, L. Sun, J. Yu, and Q. Jin. 2008. VFDB 2008 release: anenhanced web-based resource for comparative pathogenomics. Nucleic Ac-ids Res. 36:D539–D542.

75. Yoon, S. S., and J. J. Mekalanos. 2006. 2,3-Butanediol synthesis and theemergence of the Vibrio cholerae El Tor biotype. Infect. Immun. 74:6547–6556.

76. Yu, H., P. M. Kim, E. Sprecher, V. Trifonov, and M. Gerstein. 2007. Theimportance of bottlenecks in protein networks: correlation with gene essen-tiality and expression dynamics. PLoS Comput. Biol. 3:e59.

77. Zotenko, E., J. Mestre, D. P. O’Leary, and T. M. Przytycka. 2008. Why dohubs in the yeast protein interaction network tend to be essential: reexam-ining the connection between the network topology and essentiality. PLoSComput. Biol. 4:e1000140.

6272 GU ET AL. J. BACTERIOL.

on Decem

ber 10, 2020 by guesthttp://jb.asm

.org/D

ownloaded from