continued development of chebi towards better usability for the systems biology and metabolic...

Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community 1Neil Swainston, 2Janna Hastings, 1Pedro Mendes, 2Christoph Steinbeck

1Manchester Centre for Integrative Systems Biology, Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7ND, United Kingdom 2The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

Introduction •  Genome-scale metabolic reconstructions require integrated computable

knowledge about biochemical entities, such as macromolecules and metabolites.

•  The use of unambiguous, semantically typed, publicly available and perennial identifiers for model components is essential if systems biology models are to be shared, reused and developed by communities, maximising their benefit1.

•  Such model annotation allows chem- and bioinformatics data, such as chemical structures and protein sequences, to extracted from resources via web service interfaces, greatly enhancing the usability and range of applications.

•  ChEBI2 is a freely available biochemical database and ontology, housed at the European Bioinformatics Institute (EBI), is becoming increasingly used in both modelling and metabolomics. http://www.ebi.ac.uk/chebi/

•  The BioModels Database currently contains over 110000 models that use ChEBI identifiers. Improvements to and extensions of the ChEBI resource will have an immediate benefit to the systems biology modelling community.

•  ChEBI will be extended for systems biology by implementing the following improvements: •  A comprehensive cross-platform API library for accessing ChEBI

programmatically will be developed. The API, libChEBI, will be made publicly available as an open source library in Java and Python.

•  It will include facilities such as extracting biologically relevant groups of compounds (such as tautomers), calculating additional physicochemical properties and semantic reasoning over model annotations.

•  The ChEBI database will be extended to enhance stability, increase community involvement and provide a new powerful visualisation for the biological context of molecular entities.

•  To extend the scope of ChEBI, all known metabolites across human, mouse, E. coli and S. cerevisiae will be incorporated and curated into the resource.

References 1.  A consensus yeast metabolic network reconstruction obtained from a

community approach to systems biology. Herrgård MJ, et al. Nat Biotechnol. 2008, 26, 1155-60.

2.  ChEBI: a database and ontology for chemical entities of biological interest. Degtyarenko K, et al. Nucleic Acids Res. 2008, 36, D344–D350.

3.  MetaboLights -- an open-access general-purpose repository for metabolomics studies and associated meta-data. Haug K, et al. Nucleic Acids Res. 2013, 41, D781-6.

4.  The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. Steinbeck C, et al. J Chem Inf Comput Sci. 2003, 43, 493-500.

libChEBI •  Programmatic access to ChEBI will be supported through the introduction of

libChEBI, a cross-platform, open source API in Java and Python. •  libChEBI will provide a link between models, modelling software and

metabolomics data3 via common use of ChEBI identifiers in existing data standards (see Fig 2).

Figure 1: Screen capture of ChEBI web interface entry for L-valine.

ChEBI •  Chemical Entities of Biological Interest (ChEBI) is a freely available, manually-

curated dictionary of ‘small’ chemical compounds, either natural or synthetic products used to intervene in the processes of living organisms.

•  ChEBI incorporates an ontological classification, specifying relationships between molecular entities or classes of entities.

•  Data represented includes unique identifiers, names and synonyms, structure, molecular formula, charge, mass, database links and citations.

•  Current release contains 34309 chemical entities. •  ChEBI is available via a web interface (see Fig 1), web services, or download.

libChEBI

Metabolomics

Metabolights

BioPAX

Modelling

Standards and software

Figure 2: Representation of the central position that libChEBI will play in the integration of metabolomics data into systems biology models.

•  libChEBI will interface with existing cheminformatics software, such as the Chemistry Development Kit (CDK)4 to provide methods for reconciling metabolites, determining charge states, estimating thermodynamic properties, facilitating network gap filling, and model checking and validation.

Further enhancements •  Extension of the performance and stability of the ChEBI web service to support

metabolic reconstruction. We will optimise the web service search implementation and API targeting the types of queries common in the metabolic reconstruction process.

•  Streamline the ChEBI curation process with partial automation of structure-based classification, text mining for biological roles, and improved workflow.

•  Improve the community interface for ChEBI submissions by introducing facilities for community-based updates.

•  Harness the knowledge encoded in metabolic models and in partner databases such as Metabolights3 to create rich visualisation of chemicals in their biological context, using "facets" for pathways, reactions and simulations.

•  We will undertake to curate all known metabolites from four key organisms into ChEBI: S. cerevisiae, E. coli, mouse and human.

•  Creation of new training materials and delivery of training courses to the community.

Cheminformatics

This work was funded by the BBSRC under the project “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community”, BB/K019783/1.

continued development of chebi towards better usability for the systems biology and metabolic...

Documents