compath comparative metabolic pathway analyzer kwangmin choi and sun kim school of informatics...
TRANSCRIPT
![Page 1: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/1.jpg)
ComPath Comparative Metabolic Pathway Analyzer
Kwangmin Choi and Sun KimSchool of Informatics
Indiana University
![Page 2: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/2.jpg)
Contents
Introduction System Components Current Implementation Experiment Result Future Plan
![Page 3: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/3.jpg)
INTRODUCTION&
SYSTEM COMPONENTS
![Page 4: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/4.jpg)
Introduction
ComPath is a web-based sequence analysis system built upon:
KEGG (Kyoto Encyclopedia of Genes and Genomes)
PLATCOM (A Platform for Computational Comparative Genomics)
![Page 5: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/5.jpg)
KEGGKyoto Encyclopedia of Genes and Genomes
Four Databases PATHWAY 32,657 pathways gen
erated from 262 reference pathways
GENES 1,213,035 genes in 32 eukaryotes + 260 bacteria + 24 archaea
LIGAND 13,387 compounds, 2,543 drugs, 11,161 glycans, 6,446 reactions
BRITE 7,817 KO (KEGG Orthology) groups
KEGG adopts EC enzyme classification system
![Page 6: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/6.jpg)
EC system 0/2
An Old, but still universally accepted system by biochemists
EC system was developed long before protein sequence or structure information were available, so the system focuses on reaction, not sequence homology and structure
Many biochemists and structural biologists try to harmonize newly available chemical, sequential, and structural data with traditional understanding of enzyme function.
![Page 7: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/7.jpg)
Problems in EC system 1/2
Inconsistency in the EC hierarchy For each of the six top-level EC classes, subclasses and sub-subclasses ma
y have different meanings. e.g. EC1.* are divided by substrate type, but EC5.* by general isomerase typ
e Problem with Multi-functional enzymes and multiple subunits involved in
a function EC presumes only a 1:1:1 relationship between gene, protein, and reaction.
Different sequence/structure, but similar EC Two enzymes with lower sequence identities sometimes belong to the same
or very similar EC. e.g. o-succinylbenzoate synthase across several bacteria have below the
40% sequence identity
![Page 8: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/8.jpg)
Problems in EC system 2/2
Similar sequence/structure, but different EC Even variation in the fourth digit of the EC number is rare above a sequence identity thre
shold of 40%. However, exceptions to this rule are prevalent. e.g. melamine deaminase and atrazine chlorohydrolase have 98% identical, b
ut belong to different EC.
No information on sequence/structure-mechanism relationship EC system considers only overall transformation Similarity among sequences is strongly correlated with similarities in the level of a com
mon (structural domain-related) partial reaction, rather than overall transformation How to combine enzyme structure data with partial reaction data?
Research Goal We provide a computational environment for enzyme analysis via genome comparison And it will be built on PLATCOM system
![Page 9: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/9.jpg)
Our Research Goal
We provide a computational environment for enzyme analysis via multiple genome comparison
And it will be built on PLATCOM system
![Page 10: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/10.jpg)
PLATCOM http://platcom.org/platcom
A Platform for Comparative Genomics
Providing a platform for comparative genomics ON THE WEB
Comparative analysis system for users to freely select any sets of genomes
Scalable system interactively combining high-performance sequence analysis tools
![Page 11: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/11.jpg)
CURRENT IMPLEMENTATION
![Page 12: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/12.jpg)
ComPath http://platcom.org/compath
Compatative Metabolic Pathway Analyzer
ComPath = KEGG + PLATCOM
Not just for retrieving information from Database,
but focuses on analyzing enzymes using the enzyme-genome table
Easy to use {Optional} Upload a user sequence and/or a saved enzyme-
genome table data Select a metabolic pathway Select any combination of genomes in KEGG Create an enzyme-genome table Then use the table for various enzyme sequence analysis tasks
![Page 13: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/13.jpg)
Screenshot: Pathway Selection
11 categories
123 pathways
Users can upload the previous Enzyme-Genome table datatype to continue analysis
![Page 14: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/14.jpg)
Screenshot: Genome Selection
250 genomes from KEGG database
Users can select genomes by taxonomical and alphabetical order
![Page 15: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/15.jpg)
Screenshot: Operations 1/2
![Page 16: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/16.jpg)
Screenshot: Operations 2/2
![Page 17: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/17.jpg)
Enzyme-Genome Table
An enzyme-genome table allows for tests on whether a certain enzyme in a given pathway is present or missing using sequence analysis techniques.
Information in this table can be easily saved, uploaded, transferred.
Users also can upload their sequence set, e.g., an entire set of predicted proteins in a newly sequenced genome, and perform annotation of the sequences in terms of KEGG pathways.
![Page 18: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/18.jpg)
Screenshot: ComPath’ Enzyme-Genome Table – INTERACTIVE!
![Page 19: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/19.jpg)
Screenshot: KEGG’s Ortholog Table – STATIC!
![Page 20: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/20.jpg)
How ComPath works:Overall Design
![Page 21: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/21.jpg)
How ComPath works:
![Page 22: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/22.jpg)
Sequence Analyses
Missing enzyme search Pairwise (FASTA) and multiple sequence alignment (CLUSTALW), Domain search using SCOPEC/SUPERFAMILY and PDB domains Domain-based analysis using hidden markov models (HMM), Contextual sequence analysis
Sequence analysis for further investigation Phylogenetic analysis of enzymes in selected genomes, Gibbs motif sampler. BAG clustering Contextual sequence analysis
Global network analysis Physical network analysis
![Page 23: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/23.jpg)
TEST
![Page 24: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/24.jpg)
Experiments:Genomes, Queries, Pathways
Selected Genomes B.subtilus, B.Halodurans, E.coli H.Influenza, H.pylori, M.genitalium, Y.pestis KIM
Query genomes M.tuberculosis A.aeolicus B.anthracis
Metabolic Pathways 00010 (glycolysis+glycogenesis), 00020 (TCA cycle)
![Page 25: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/25.jpg)
Experiments:Comparison of Sequence Analysis Methods Four methods (abbr.)
HMMer HMM search using the whole sequence
CSR HMM search using common shared regions generated by BAG progra
m SCOPEC
Domain search using SCOP/SUPERFMAILY and PDB database FASTA
Simple FASTA search
Cutoff
1e-10, 1e-20, 1e-30
![Page 26: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/26.jpg)
Experiments:Overall Design
![Page 27: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/27.jpg)
Screenshot: ComPath’ Enzyme-Genome Table – INTERACTIVE!
![Page 28: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/28.jpg)
Experiment Results (e.g.)Query Genome Pathway Mehthod Sensitivity Specificity E-value
M. tuberculosis Path 00010 HMMer 0.596491228 0.454545455 1.00E-30
CSR 0.666666667 0.454545455 1.00E-30
SCOPEC 0.614035088 0.348484848 1.00E-30
FASTA 0.649122807 0.378787879 1.00E-30
HMMer 0.623188406 0.524590164 1.00E-10
CSR 0.739130435 0.360655738 1.00E-10
SCOPEC 0.652173913 0.418032787 1.00E-10
FASTA 0.811594203 0.204918033 1.00E-10
Query Genome Pathway Method Sensitivity Specificity E-value
M. tuberculosis Path 00020 HMMer 0.535714286 0.769230769 1.00E-30
CSR 0.642857143 0.846153846 1.00E-30
SCOPEC 0.535714286 0.769230769 1.00E-30
FASTA 0.678571429 0.615384615 1.00E-30
HMMer 0.516129032 0.777777778 1.00E-10
CSR 0.709677419 0.666666667 1.00E-10
SCOPEC 0.548387097 0.777777778 1.00E-10
FASTA 0.741935484 0.333333333 1.00E-10
![Page 29: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/29.jpg)
A. aeolicus
![Page 30: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/30.jpg)
B. anthracis
![Page 31: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/31.jpg)
M. tuberculosis
![Page 32: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/32.jpg)
FUTURE PLAN
![Page 33: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/33.jpg)
Future Plan:More Resources
ComPath is being extended to incorporate more resources, including KEGG LIGAND : A composite database consisting of compound, glycan, reaction et
c. ProRule : A new database containing functional and structural information on PRO
SITE profiles SFLD : Structure-Function Linkage Database
Also we are developing databases and algorithms for enzyme analysis, e.g.
Classifiers using a database of enzyme-specific HMMs.
ComPath is in an early stage of system development and we solicit feedback and suggestions from biology and bioinformatics communities.
![Page 34: ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University](https://reader030.vdocuments.site/reader030/viewer/2022032804/56649e425503460f94b349c5/html5/thumbnails/34.jpg)
Future Plan:More Algorithms and Tools
More integrative understanding on biochemical network evolution Algorithms to handle isozyme problem Algorithms to computationally reconstruct alternative pathways Algorithms to combine sequence, structure, chemical reaction, and co
ntextual information for better enzyme annotation Etc.
ComPath is in an early stage of system development and we solicit feedback and suggestions from biology and bioinformatics communities.