the ondex data integration framework

19
The ONDEX data integration framework BOSC2007, Vienna 20.07.2007 Jan Taubert ([email protected] ) Rothamsted Research, UK

Upload: bosc

Post on 03-Nov-2014

19 views

Category:

Technology


4 download

DESCRIPTION

Title: The Ondex Data Integration FrameworkAuthor: Jan Taubert

TRANSCRIPT

Page 1: The Ondex Data Integration Framework

The ONDEX data integration framework

BOSC2007, Vienna20.07.2007Jan Taubert ([email protected])Rothamsted Research, UK

Page 2: The Ondex Data Integration Framework

Summary

ONDEX – framework for large scale data integration, text mining & graph analysis

JAVA API and standalone application License: GNU General Public LicenseProject status: early alphaStatistics: 814 files, 159572 lines

http://ondex.sourceforge.net

Page 3: The Ondex Data Integration Framework

University of Bielefeld, Bielefeld, Germany. University of Koblenz, Koblenz, Germany. University of Nottingham, Nottingham, Nottinghamshire, UK. University of Tromsø, Tromsø, Norway.University of Wageningen, Wageningen, Netherlands.Rothamsted Research, Harpenden, Hertfordshire, UK.

Members

Current members: • Jan Baumbach• Sonja Ernst • Keywan Hassani-Pak • Matthew Hindle • Berend Hoekman • Jacob Köhler • Artem Lysenko • Stephan Philippi• Chris Rawlings• Jan Taubert • Paul Verrier • Jochen Weile • Rainer Winnenburg • Tully Yates

Former members: • Jessica Butz • Sebastian Elsner • Ina Kupp • Alexander Rüegg • Klaus Peter Sieren • Andre Skusa • Michael Specht

Page 4: The Ondex Data Integration Framework

Based on

JAVA J2SE 5.0Berkeley DB Java EditionXFire SOAP frameworkJetty WebServerLucene text search engineTaverna for workflows

Page 5: The Ondex Data Integration Framework

Large Experimental Data

New Insights

ONDEX

Motivation

100‘s of Bio-Databases

combine

Page 6: The Ondex Data Integration Framework

enzyme kinetics

protein interactions metabolic pathways

… in which nodes and edges can have different properties.

protein structure relation properties

ontologies

Everything is a network

Page 7: The Ondex Data Integration Framework

ONDEX: Graph of Concepts and Relations

Protein Ligand

interact

Protein

interact

Biology: Protein interaction network

Concept Concept Concept

Relation Relation

Ontology of Concept Classes, Relation Types and additional Properties

Concept Class: Protein ProteinLigand

Relation Type: interact interact

Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …

Ontology based graph

Protein – Ligand interaction network

Page 8: The Ondex Data Integration Framework

Data AnalysisData Integration

Exchange formats

XGMML, RDF, OBO, PSI MI, SBML, FASTA

OXLDatabases with

OXL supportPHI-base

Flatfile databasesKEGG, TF, TP, BioCyc, Drastic, MeSH, Medline...

ONDEX data integration framework

Con

sist

ency

che

cks

ONDEX core API

ONDEX Metadata

Query API

Exporter

Webservices

JSP webinterface

Ontology based graph structure

Data alignment methods

ONDEX Visualisation and Analysis Tool Kit (OVTK)

Taverna

Webinterface frontend

OXL

XGMML

SBML

ONDEX Metadata Editor

OXL

Data Input

RDFFASTAGraph

ML, GML

Flatfile parser

Format importer

Page 9: The Ondex Data Integration Framework

100‘s of Bio-Databases

Microarray experiment result analysis

map

One example application

Page 10: The Ondex Data Integration Framework

Comp

Protein

GeneEnzyme

EC

Treat-ment

Reaction

Pathway

Page 11: The Ondex Data Integration Framework

Treatments from DRASTIC

Pathways from KEGG

Page 12: The Ondex Data Integration Framework

How to contribute

Three steps:

1.Try it out, submit bugs you find

2.Suggest/Implement your improvements

3.Become a contributor and submit improvements

http://ondex.sourceforge.net

Contributors will be acknowledged on the project website and in publications involving their work. Contributors are welcome to publish their work on ONDEX under their own names.

Page 13: The Ondex Data Integration Framework

Exchange formats

XGMML, RDF, OBO, PSI MI, SBML, FASTA

OXLDatabases with

OXL supportPHI-base

Flatfile databasesKEGG, TF, TP, BioCyc, Drastic, MeSH, Medline...

Data Input

Flatfile parser

Format importer

Good: Write Flatfile parser for ONDEX

Better: Provide your database in OXL (see Taubert et al. (2007) “Exchange of integrated datasets – the OXL format”, in press, IB2007)

Also welcome: Provide your database in another standard (BioPax, SBML, XGMML; but may result in loss of information)

http://ondex.sourceforge.net

What to contribute

Page 14: The Ondex Data Integration Framework

Data Integration

ONDEX data integration framework

Con

sist

ency

che

cks

ONDEX core API

ONDEX Metadata

Query API

Exporter

Webservices

JSP webinterface

Ontology based graph structure

Data alignment methods

OXL

XGMML

SBML

RDFFASTA

Algorithms: Needed for the alignment of integrated data

Core: Improve persistency layer of Ontology based graph

Exporter: Provide your own exchange standard

Webservices: Increase compatibility

http://ondex.sourceforge.net

What to contribute

Page 15: The Ondex Data Integration Framework

http://ondex.sourceforge.net

Data Analysis

ONDEX Visualisation and Analysis Tool Kit (OVTK)

Taverna

Webinterface frontend

OXL

XGMML

SBML

Graph ML, GML

Support: OXL in your application

Connect: Import from web service or directly from Core API

Algorithms: Graph analysis using the ONDEX Visualisation and Analysis Tool Kit (OVTK)

Feedback & Feature requests: Mailing lists and Sourceforge.net

What to contribute

Page 16: The Ondex Data Integration Framework

Jun 06 – Jun 07:

1828 Downloads,

15289 page views

Current SF.net rank: 1074

Subversion Activity Jan 07 – Jun 07

8904 Reads

2072 Writes

4821 File Uploads

Developer mailing list: [email protected]

User mailing list: [email protected]

Current release: 0.9alpha1

Sourceforge.net

Page 17: The Ondex Data Integration Framework

J Taubert, R Winnenburg, M Hindle, J Weile, J Baumbach, S Philippi, C Rawlings and J Köhler (2007) “Data integration, information filtering and knowledge extraction with ONDEX”, Paper in preparation

J Taubert, K P Sieren, M Hindle, B Hoekman, R Winnenburg, S Philippi, C Rawlings and J Köhler (2007) “Exchange of integrated datasets – the OXL format”, Submitted Paper, 4th integrative bioinformatics workshop (IB2007)

Jacob Köhler, Stephan Philippi, Michael Specht and Alexander Rüeg (2006) "Ontology based text indexing and querying for the semantic web", Knowledge-Based Systems, Volume 19, Issue 8

Jacob Köhler, Jan Baumbach, Jan Taubert, Michael Specht, Andre Skusa, Alexander Rüegg, Chris Rawlings, Paul Verrier and Stephan Philippi (2006) "Graph-based analysis and visualization of experimental results with ONDEX", Bioinformatics 22(11)

Skusa, A., Rüegg, A., Köhler, J. (2005) "Extraction of biological networks from scientific literature", Briefings in Bioinformatics 6(3)

Köhler, J., Rawlings, C., Verrier, P., Mitchell, R., Skusa, A., Rüegg, A. and Philippi, S. (2004), "Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalized Data Structures", In Silico Biol, Volume 5, Special Issue: Ontology and Genome, Manuscript number in online journal: 0005.

Publications

Page 18: The Ondex Data Integration Framework

Acknowledgements

Centre for Mathematical and Computational BiologyDepartment of Biomathematics and BioinformaticsRothamsted Research

Dr Jacob Köhler, Principle Investigator

Prof Chris Rawlings, Head of Department

Rothamsted Research is supported by the BBSRC

Travel grants and scholarships by

Page 19: The Ondex Data Integration Framework

4th Integrative Bioinformatics workshop 10th to 12th September 2007

University of Ghent, Belgium

http://www.rothamsted.bbsrc.ac.uk/bab/conf/ib07/

13thAugust 2007 Registration deadline

27thAugust 2007 Poster submission deadline

Invited speakers: Prof Carole Goble, School of Computer Science, University of Manchester, UKProf Søren Brunak, BioCentrum-DTU, Technical University of Denmark, DenmarkDr David Searls, Senior Vice President, Informatics, GlaxoSmithKline Pharmaceuticals, USADr Luis Serrano, EMBL, Heidelberg, Germany

Organising committee:Prof Ralf Hofestädt, University of Bielefeld, Germany (Co-chair)  Dr Jacob Koehler, Rothamsted Research, UK (Co-chair) Prof Martin Kuiper, University of Ghent, Belgium (Local organisation)Paul Verrier, Rothamsted Research, UK (Local organisation)