the ondex data integration framework
DESCRIPTION
Title: The Ondex Data Integration FrameworkAuthor: Jan TaubertTRANSCRIPT
The ONDEX data integration framework
BOSC2007, Vienna20.07.2007Jan Taubert ([email protected])Rothamsted Research, UK
Summary
ONDEX – framework for large scale data integration, text mining & graph analysis
JAVA API and standalone application License: GNU General Public LicenseProject status: early alphaStatistics: 814 files, 159572 lines
http://ondex.sourceforge.net
University of Bielefeld, Bielefeld, Germany. University of Koblenz, Koblenz, Germany. University of Nottingham, Nottingham, Nottinghamshire, UK. University of Tromsø, Tromsø, Norway.University of Wageningen, Wageningen, Netherlands.Rothamsted Research, Harpenden, Hertfordshire, UK.
Members
Current members: • Jan Baumbach• Sonja Ernst • Keywan Hassani-Pak • Matthew Hindle • Berend Hoekman • Jacob Köhler • Artem Lysenko • Stephan Philippi• Chris Rawlings• Jan Taubert • Paul Verrier • Jochen Weile • Rainer Winnenburg • Tully Yates
Former members: • Jessica Butz • Sebastian Elsner • Ina Kupp • Alexander Rüegg • Klaus Peter Sieren • Andre Skusa • Michael Specht
Based on
JAVA J2SE 5.0Berkeley DB Java EditionXFire SOAP frameworkJetty WebServerLucene text search engineTaverna for workflows
Large Experimental Data
New Insights
ONDEX
Motivation
100‘s of Bio-Databases
combine
enzyme kinetics
protein interactions metabolic pathways
… in which nodes and edges can have different properties.
protein structure relation properties
ontologies
Everything is a network
ONDEX: Graph of Concepts and Relations
Protein Ligand
interact
Protein
interact
Biology: Protein interaction network
Concept Concept Concept
Relation Relation
Ontology of Concept Classes, Relation Types and additional Properties
Concept Class: Protein ProteinLigand
Relation Type: interact interact
Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …
Ontology based graph
Protein – Ligand interaction network
Data AnalysisData Integration
Exchange formats
XGMML, RDF, OBO, PSI MI, SBML, FASTA
OXLDatabases with
OXL supportPHI-base
Flatfile databasesKEGG, TF, TP, BioCyc, Drastic, MeSH, Medline...
ONDEX data integration framework
Con
sist
ency
che
cks
ONDEX core API
ONDEX Metadata
Query API
Exporter
Webservices
JSP webinterface
Ontology based graph structure
Data alignment methods
ONDEX Visualisation and Analysis Tool Kit (OVTK)
Taverna
Webinterface frontend
OXL
XGMML
SBML
ONDEX Metadata Editor
OXL
Data Input
RDFFASTAGraph
ML, GML
Flatfile parser
Format importer
100‘s of Bio-Databases
Microarray experiment result analysis
map
One example application
Comp
Protein
GeneEnzyme
EC
Treat-ment
Reaction
Pathway
Treatments from DRASTIC
Pathways from KEGG
How to contribute
Three steps:
1.Try it out, submit bugs you find
2.Suggest/Implement your improvements
3.Become a contributor and submit improvements
http://ondex.sourceforge.net
Contributors will be acknowledged on the project website and in publications involving their work. Contributors are welcome to publish their work on ONDEX under their own names.
Exchange formats
XGMML, RDF, OBO, PSI MI, SBML, FASTA
OXLDatabases with
OXL supportPHI-base
Flatfile databasesKEGG, TF, TP, BioCyc, Drastic, MeSH, Medline...
Data Input
Flatfile parser
Format importer
Good: Write Flatfile parser for ONDEX
Better: Provide your database in OXL (see Taubert et al. (2007) “Exchange of integrated datasets – the OXL format”, in press, IB2007)
Also welcome: Provide your database in another standard (BioPax, SBML, XGMML; but may result in loss of information)
http://ondex.sourceforge.net
What to contribute
Data Integration
ONDEX data integration framework
Con
sist
ency
che
cks
ONDEX core API
ONDEX Metadata
Query API
Exporter
Webservices
JSP webinterface
Ontology based graph structure
Data alignment methods
OXL
XGMML
SBML
RDFFASTA
Algorithms: Needed for the alignment of integrated data
Core: Improve persistency layer of Ontology based graph
Exporter: Provide your own exchange standard
Webservices: Increase compatibility
http://ondex.sourceforge.net
What to contribute
http://ondex.sourceforge.net
Data Analysis
ONDEX Visualisation and Analysis Tool Kit (OVTK)
Taverna
Webinterface frontend
OXL
XGMML
SBML
Graph ML, GML
Support: OXL in your application
Connect: Import from web service or directly from Core API
Algorithms: Graph analysis using the ONDEX Visualisation and Analysis Tool Kit (OVTK)
Feedback & Feature requests: Mailing lists and Sourceforge.net
What to contribute
Jun 06 – Jun 07:
1828 Downloads,
15289 page views
Current SF.net rank: 1074
Subversion Activity Jan 07 – Jun 07
8904 Reads
2072 Writes
4821 File Uploads
Developer mailing list: [email protected]
User mailing list: [email protected]
Current release: 0.9alpha1
Sourceforge.net
J Taubert, R Winnenburg, M Hindle, J Weile, J Baumbach, S Philippi, C Rawlings and J Köhler (2007) “Data integration, information filtering and knowledge extraction with ONDEX”, Paper in preparation
J Taubert, K P Sieren, M Hindle, B Hoekman, R Winnenburg, S Philippi, C Rawlings and J Köhler (2007) “Exchange of integrated datasets – the OXL format”, Submitted Paper, 4th integrative bioinformatics workshop (IB2007)
Jacob Köhler, Stephan Philippi, Michael Specht and Alexander Rüeg (2006) "Ontology based text indexing and querying for the semantic web", Knowledge-Based Systems, Volume 19, Issue 8
Jacob Köhler, Jan Baumbach, Jan Taubert, Michael Specht, Andre Skusa, Alexander Rüegg, Chris Rawlings, Paul Verrier and Stephan Philippi (2006) "Graph-based analysis and visualization of experimental results with ONDEX", Bioinformatics 22(11)
Skusa, A., Rüegg, A., Köhler, J. (2005) "Extraction of biological networks from scientific literature", Briefings in Bioinformatics 6(3)
Köhler, J., Rawlings, C., Verrier, P., Mitchell, R., Skusa, A., Rüegg, A. and Philippi, S. (2004), "Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalized Data Structures", In Silico Biol, Volume 5, Special Issue: Ontology and Genome, Manuscript number in online journal: 0005.
Publications
Acknowledgements
Centre for Mathematical and Computational BiologyDepartment of Biomathematics and BioinformaticsRothamsted Research
Dr Jacob Köhler, Principle Investigator
Prof Chris Rawlings, Head of Department
Rothamsted Research is supported by the BBSRC
Travel grants and scholarships by
4th Integrative Bioinformatics workshop 10th to 12th September 2007
University of Ghent, Belgium
http://www.rothamsted.bbsrc.ac.uk/bab/conf/ib07/
13thAugust 2007 Registration deadline
27thAugust 2007 Poster submission deadline
Invited speakers: Prof Carole Goble, School of Computer Science, University of Manchester, UKProf Søren Brunak, BioCentrum-DTU, Technical University of Denmark, DenmarkDr David Searls, Senior Vice President, Informatics, GlaxoSmithKline Pharmaceuticals, USADr Luis Serrano, EMBL, Heidelberg, Germany
Organising committee:Prof Ralf Hofestädt, University of Bielefeld, Germany (Co-chair) Dr Jacob Koehler, Rothamsted Research, UK (Co-chair) Prof Martin Kuiper, University of Ghent, Belgium (Local organisation)Paul Verrier, Rothamsted Research, UK (Local organisation)