architectural constraints on current bioinformatics integration systems norman paton department of...

27
Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester, UK <norm>@cs.man.ac.uk

Upload: carter-kane

Post on 28-Mar-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Architectural Constraints on Current Bioinformatics Integration Systems

Norman PatonDepartment of Computer Science

University of ManchesterManchester, UK

<norm>@cs.man.ac.uk

Page 2: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Structure of Presentation Current integration proposals.

What they support. What they don’t support, and why.

Requirements for integration. What could be useful, and why.

Grid opportunities. Relevant Grid technologies. Absent Grid technologies.

Page 3: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Current Integration Proposals

Page 4: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Classification

Feature Values

Data Location In-situ, Replicated, Reorganised

Integration Model

None, Relational, Semi-Structured, Object-Oriented

Architecture Thin Client, Client-Server, Multi-Tier

Analysis Support

Function Call, Query, Workflow

Page 5: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

SRS

Sequence Retrieval Systemhttp://srs.ebi.ac.uk/

Page 6: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

SRS In Use

List of Database

s

Search Interface

s

Selected Database

s

Page 7: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

SRS Results

Links to Result

Records

Page 8: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Classification of SRS

Feature Values

Data Location Replicated

Integration Model

None

Architecture Thin Client

Analysis Support

Function Call, Query

Page 9: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

BioNavigator BioNavigator combines data

sources and the tools that act over them.

As tools act on specific kinds of data, the interface makes available only tools that are applicable to the data in hand.

Online trial from:https://www.bionavigator.com/

Page 10: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Initiating Navigation

Select database

Enter accession number

Page 11: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Viewing Selected Data

Relevant display options

Navigate to related programs

Page 12: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Chaining Analyses in Macros

Chained collections of navigations can be saved as macros and restored for later use.

Page 13: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Classification of BioNavigator

Feature Values

Data Location Replicated

Integration Model

None

Architecture Thin Client

Analysis Support

Function Call, Workflow

Page 14: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Current Public Integration Systems Location: data is replicated – under

control. Integration model: often minimal. Architecture: The architecture is often

two-tier. Analysis support: Query and analysis

access is carefully contained.

Only very careful instantiation of the classificationyields sufficiently predictable performance.

Page 15: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

GIMS

Page 16: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

GIMS – recent experience

Feature Values

Data Location Reorganised

Integration Model

Object-Oriented

Architecture Multi-tier

Analysis Support

Function Call

Page 17: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Example Analysis Data:

Yeast genome sequence. Protein-protein interaction data. 350 transcriptome experiments. Overall database ~350Mb.

Analysis: Correlate transcription of interacting

proteins.

Page 18: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Features of Experience Challenging to conduct single runs

of analyses – must break into bits. These are modest data sets

compared with what is coming. Environment has been designed

with analysis in mind. These analyses will never make it

into the public release!

Page 19: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Requirements for Integration

Page 20: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Requirements for Integration Location: replication is

transparent. Integration model: standards. Architecture: Flexible, multiple tier. Analysis support: Arbitrary

analyses over diverse data sets.True integration in bioinformatics should not just be data oriented, but involve integration of analyses.

Page 21: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Three Tier Architecture Clients handle

user interaction and presentation.

Application servers perform computation and analysis.

Data servers manage and query databases.

Client

ApplicationServer

DataServer

Page 22: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Three Tier Architecture Scaleability:

Replace/Upgrade components as needed.

Replace/Upgrade layers independently. Flexibility:

Application server layer protects clients from changes in database layer.

Classical three tier architectures are configured statically, and are adapted slowly as needs evolve.

Page 23: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Grid Opportunities

Page 24: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Necessary and Missing Necessary:

Directory services. Discovery

services. Co-allocation. Data replication. Workload

management. Accounting and

payment.

Missing: Databases. Data models. Heterogeneity

resolution. Personalisation. Web services. Standards.

Page 25: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Dynamic Multi-Tier

Client

ApplicationServer

DataServer

ApplicationServer

ApplicationServer

DataServer

Resources need to be identified,selected andscheduleddynamically.

Page 26: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Grid Classification

Feature Values

Data Location In-situ, Replicated

Integration Model

None

Architecture Multi-Tier

Analysis Support

Function Call, Workflow

The current Grid is not the answer, but the answersubsumes the current facilities of the Grid.

Page 27: Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Summary Current integration facilities in biology:

Are cunningly restrictive. Make the most of limited distributed

computational architectures. The Grid is bringing to the table:

Resource description facilities. Resource scheduling and workflow

management facilities. The Grid does not directly address current

needs in biology, but its descendents may.