[email protected] anis karimpour-fard ‡, ryan t. gill †,
TRANSCRIPT
[email protected]://www.colorado.edu/che/research/faculty/gill/http://compbio.uchsc.edu/Hunter
Anis Karimpour-Fard‡ , Ryan T. Gill†
, and Lawrence Hunter‡
‡ University of Colorado School of Medicine
† Department of Chemical and Biological Engineering, University of Colorado, Boulder
Investigation of factors affecting prediction of protein-protein interaction networks by
phylogenetic profiling
Dec 1, 2007
The meaning of protein function
Eisenberg, D. et. al. Nature 2000
S PA
Biochemical view
The function of protein A is its action on Substrate to form a Product
The function of A is the context of its interactions with other proteins in the cell
Post genomic view
A
B
YZ
MDN
X C
The problem ……
More than 500 Microbial genomes are fully sequence and there is high percent of genes with unknown function.
For example: E. coli K12 15%
P. aeruginosa 45%http://www.genomesonline.org/
• Homology based methods (gives partial understanding about protein role)– Simple sequence similarity searches (BLAST)– Profile searches (PSI-BLAST)– Databases of conserved domains (Pfam, SMART)
• Prediction from genomic context• Phylogenetic profile• Gene cluster• Gene neighbor• Rosetta Stone
• Prediction from high-throughput experimental data– Microarray gene expression data– Protein-protein interaction screens– ...
Prediction protein function
Phylogenetic Profile
Pellegrini et al. PNAS 96, 4285 (1999)Marcotte et al. PNAS 97, 12115 (2000)
1- Select sets of genomes as a reference set
2- Create phylogenetic profile matrix for target organism:
•Do one-against-all BLAST search to identify all homologous target genes in diverse reference organisms.
Does the selection of the reference genomes influence the prediction?
if so? How?
How E-value threshold effects the protein-protein interactions prediction?
Reference selection?
Blast E-value threshold (present or absent)
Measure profile similarities
Reference selection
Protein X: 110001111001001110001111Protein Y: 11100011110000011000111119 matching bits out of 24
3- Measure profile similarities
4- Generate protein-protein interactions
Generate Protein-protein interactions network
5- Create clusters from set of protein-protein interactions
Protein X Protein Y
2 nodes are connected if the 2 proteins have similar profile)
6- Visualize network
Protein X Protein Y
Measure profile similarities
Protein X: 110001111001001110001111Protein Y: 111000111100000110001111
•Mutual information
MI(X, Y) = H(X) + H(Y) - H(X, Y)
H(Y) = -∑p(i) ln p(i)
p(i), (i= 0, 1) as the fraction of genomes in which protein Y in the state i
2 nodes are connected if the 2 proteins have similar profile)
•Pearson correlation coefficient
1
0i
1
0j),(ln),(Y)H(X, jipjip
•Inverse homology
•Calculate the homology between two genomes:
• The ratio of number of homologs of each reference organism j to the number of proteins in the target genome i ( Hi,j) .
•Pij =1/( Hi,j) otherwise Pij =0.Karimpour-Fard et al. BMC Genomics.
2007;8(1):393
c)
Comparison of different combinations of reference genomes and E-value thresholds using COG
• PPV =TP/(TP+FP)
– TP = # predicted pair in the same functional category
– FP= # predicted pair that were classified but were not same functional category
Random sets
AllLow GC
Aerobic
Karimpour-Fard et al. BMC Genomics.
2007;8(1):393
Co-evolution can be used to assign function to unstudied genes
Hypothetical proteins YcgB, YeaH, YeaG are co-conserved across different species. Comparison of sub-graphs across species (CS-CCC) suggested that a previously unstudied S. typhimurium gene, ycgB, is functionally related to yeaH. Experimental data support the hypothesis that both genes are important for antimicrobial peptide resistance.
Edge color code:
• E. coli K12 (green)
•E. coli O157 (blue)
•Shigella flexneri (black)
•S. typhimurium LT2 (purple)
•P. aeruginosa (mustard)
Karimpour-Fard et al. Genome Biology 2007 8:R185