my grid: upper level grid services for the bioinformatican prof. carole goble sun microsystems...

45
my Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble http:// www.mygrid.org.uk Sun Microsystems BioGrid Symposium, Baltimore, USA 4 th -5 th December 2002

Upload: delilah-logan

Post on 01-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

myGrid: Upper level Grid Services

for the Bioinformatican

Prof. Carole Goblehttp://www.mygrid.org.uk

Sun Microsystems BioGrid Symposium, Baltimore, USA 4th-5th December 2002

Page 2: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

UK eScience Programme

Grid-enabled eScienceEmphasis on information integration

and knowledge managementThe Virtual Organisation view$180 million + industrial contributionsComplete infrastructure of regional eScience centres, support and a UK computational GridStarted on Globus though Unicore

used in EuroGrid with great successCentres donated equipment – highly

heterogeneousCore component of the EU Grid FP6

programme

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

CardiffSouthampton

London

BelfastDL

RALHinxton

Page 3: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

myGrid

IBM

• EPSRC UK eScience pilot project• 01/01/02 - end 30/03/05• Uses the UK Grid infrastructure

Lion BioSciences, Millennium Pharmaceuticals & Oracle

Page 4: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

• Not a computational grid project• Building Grid middleware• Higher level services: workflow, databases,

knowledge management, provenance…• Service-based : Open Grid Service

Architecture early adopter• Bioinformatics services are published as Web

services and Grid Services• Working with publicly available biological

resources: e.g. EMBL-EBI

myGrid

Page 5: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

What is the Grid?• Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual organizations

• On-demand, ubiquitous access to computing, data, and all kinds of services

• New capabilities constructed dynamically and transparently from distributed services

• No central location, No central control, No existing trust relationships, Little predetermination

• Uniformity, Pooling & Virtualisation

Page 6: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

What is the Grid?• In silico experiments

– Information harvesting & PSE– Dynamically forming virtual

organisations to solve problems.

– Describing, searching for and weaving resources: people. applications, db, content, instruments

– Orchestrating resources – Support for scientific method:

provenance, argumentation, opinion contextualisation etc

• BioUtility & communities of practice

Knowledge Grid

Information Grid

Data/Computation Grid

“E-Scientists” Environment

Page 7: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Information Weaving

• Large amounts of different kinds of data & many applications.

• Highly heterogeneous.– Different types, algorithms,

forms, implementations, communities, service providers

• High autonomy.• Highly complex and inter-

related, & volatile.• Much of it textual narrative

Page 8: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Circadian Rhythms

1. Has anyone else studied the effect of neurotransmitters on the circadian rhythms in Drosophila?

2. I’ve got a cluster of proteins from my experiment. How do their functions interrelate? And what are the proteins with a particular function?

3. Is a structure known for my protein? What other proteins have a similar structure?

4. Can I build a homology 3D model?5. What is known about a homologous

protein?

1

2

54

3

Page 9: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

e-Science Q & A

Who else has asked this question & can I use/adapt their approach?– Workflow.

What were the results at each stage?– Dynamic Data Repositories.

When was P12345 last updated?Which BLAST did I use?

– Provenance.Has PDB changed since I last ran this?

– Notification.

1

2

54

3

Personalisation.

3

54

Page 10: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Courtesy of Mark Wilkinson (BioMOBY)

Page 11: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

myGrid • Service based architecture

– Publication, discovery, interoperation, composition, decommissioning of myGrid services

• Resource Interoperation– Workflow coordination & Database

integration.– Experimental workflows rather than

production workflows.• Experimentation

– Provenance & Change Propagation– Personalisation & Collaborative working.

• Security & ownership• Knowledge based using metadata and

ontologies

RASMOL

Page 12: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Metadata

Knowledge(ontologie

s)

Low level Grid Common Services (OGSI)Co-scheduling, data shipping, authentication, job execution, resource monitoring, database access

Middle level Grid Common Services:Database access, distributed query processing, service discovery, workflow enactment, event

notification

Upper level knowledge-based Grid Common Services:

Semantic integration, knowledge based querying, workflow composition, visualisation, provenance

mgt, semantic service discovery

Pro

ven

an

ce

Pers

on

alis

aio

nSecu

rity

BioMedical Services Library:DAS, workflow sets, integrated databases

Web Portal

Carp Gene expression

analysis

TALISMANannotation workbench

Workbench

Page 13: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Who is myGrid for?myGrid users

biologists IS specialists

infrequentproblem specificbioinformaticians

tool builders

serviceprovider

systemsadministrators

bioinformaticstool builders

Page 14: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

myGrid Outcomes

• e-Scientists– Environment built on toolkits for service access,

personalisation & community.– Talisman – Interpro family of pattern databases

annotation– UTOPIA – visual multiple sequence alignment– Workbench for gene expression in Carp & Graves

disease• Developers

– Protocols and service descriptions.– myGrid-in-a-Box developers kit of core services.– Reference implementation services & applications.– Bio services.

Page 15: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Service based architecture

• Each bio resource is a service– Database, archive, analysis,

tool, person, instrument, a workflow …

• Each myGrid architectural component is a service– Workflow enactment

engine, event notification, registry, scheduler…

• OGSA early adopter.

Web services

Grid protocols

Open Grid Service

Architecture

Page 16: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Metadata+ontology• Service registration, discovery,

publication, composition, management.

• Data types & ontologies• Service matchmaking• Ontology editor, deployment

server & reasoner• Typing inputs and outputs of

workflows• Semantic Database integration• Portal driving ….

Web services

Grid protocols

OGSA

Semantic Web

W3C: RDF,DAML+OIL, OWL

Page 17: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives.

2. Once the user has entered a partial description they submit it for matching. The results are displayed below.

3. The user adds the operation to the growing workflow.

4. The workflow specification is complete and ready to match against those in the workflow repository.

Page 18: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Integration & Coordination

• View-based Information Repository for XML data

• Database integration– Access XML and RDBMS with OGSA-DAI– Semantic database integration.– Distributed query processing.

• Workflow – Dynamic workflow enactment engine.– Workflow repository– User interactivity.– Workflows linked with results

Page 19: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

E-Science Support

• Data provenance and resource change management– Workflow logs.– Event notification service.– Incremental view management.– Workflow and query evolution.

• Personalisation– Management of views over repositories.– Personalisation of process flows. – Annotation of data sets and workflows– Dynamic creation of personal data sets.

Page 20: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Bio-Science services

• Grid-enabled BioServices by the EMBL-European Bioinformatics Institute– EMBOSS, SRS, Open BQS, BLAST, XEmbl and

EmblFetch, Flybase, Gadfly …

• Applications using Gateway API– TALISMAN (annotation tool used by Interpro)– UTOPIA (sequence fingerprint analysis)

• Portal• Workbench application

Page 21: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

How do the functions of a

cluster of proteins

interrelate?

Some proteins in my personal repository

Portal

PersonalRepository

Meta Data:Ontology

WorkflowRepository

Meta Data:Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Page 22: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Find services that takes a protein and gives their functions and pick the best match.

Portal

PersonalRepository

Meta Data:Ontology

WorkflowRepository

Meta Data:Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Page 23: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Find another that displays the proteins base on their function. Ontology restricts inputs & outputs

Portal

PersonalRepository

Meta Data:Ontology

WorkflowRepository

Meta Data:Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Page 24: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Build a workflow of composed services linked together

Portal

PersonalRepository

Meta Data:Ontology

WorkflowRepository

Meta Data:Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Page 25: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

See if a workflow that is appropriate already exists. It could have been made anyone who will share with you.

Portal

PersonalRepository

Meta Data:Ontology

WorkflowRepository

Meta Data:Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Page 26: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Pick one and enact it.

Portal

PersonalRepository

Meta Data:Ontology

WorkflowRepository

Meta Data:Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Page 27: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

While its running it picks the best service instance that can run the service at that time.

Repos.Client

Bioinformatic Services

PersonalRepository

WorkflowEnactment

ServiceDirectory

4

2

2?

2?Provenance

Data

3

WorkflowClient

Service SelectionClient

1

Page 28: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Repos.Client

Bioinformatic Services

PersonalRepository

WorkflowEnactment

ServiceDirectory

4

2

2?

2?Provenance

Data

3

WorkflowClient

Service SelectionClient

1

While its running it picks the best service instance that can run the service at that time.

Or you choose.

Page 29: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

The workflow finishes with the final display service

Repos. Client

Bioinformatic Services

PersonalRepository

WorkflowEnactment

ServiceDirectory

4

2

2?

2?Provenance

Data

3

WorkflowClient

Service SelectionClient

1

Page 30: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Results are put into your personal repository, with a concept from the ontology to tell you and myGrid what they mean.

Repos. Client

Bioinformatic Services

PersonalRepository

WorkflowEnactment

ServiceDirectory

4

2

2?

2?Provenance

Data

3

WorkflowClient

Service SelectionClient

1

Page 31: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

And full provenance record kept, and linked with the results. We could redo or reuse the workflow.

Repos. Client

Bioinformatic Services

PersonalRepository

WorkflowEnactment

ServiceDirectory

4

2

2?

2?Provenance

Data

3

WorkflowClient

Service SelectionClient

1

Page 32: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

HPC vs Bioinformatics

• Computational Biology vs Bioinformatics => HPC vs Info Grid– Relationship between them? Shared

components? Architectures? – Information management matters!

Accelerating scientific process is not just accelerating compute intensive processes.

• HPC style BioGrid– Provenance? Personalisation? Metadata?

Interactivity? Knowledge? Intermediate results to db; annotated logs…

Page 33: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

We are not alone

• Other Efforts – we are not alone– W3C semantic web, BioMOBY, I3C, OMG

LSR, active ontology development in the community, DARPA,

• Open Grid Service Architecture– We believe!! Links with Web Services give

many benefits.– But it’s a moving target … – GGF is a zoo … over 40 RG and WG, often

overlapping.

Page 34: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Service Providers • Its hard to get Service Providers buy-in

– lower the barriers of entry– make it reliable.– security & intellectual property management– programmatic interfaces

• How do we migrate legacy applications?– Whole bunch of apps and databases on the web

• Accounting matters– Who is going to pay for all this?

Page 35: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Hotch potch

• Heterogeneity sucks– Multi-policy of everything – security,

access, accounting really matters in EU– Getting a UK Grid to work is non-trivial– Huge investment in system admin.

• Doing more than you could do before.– Not just another predictable BLAST

service over a bunch of machines– Non-predictable analysis.

Page 36: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Not a silver bullet! Its just middleware not magic• Data quality• Content management of databases (controlled

vocabularies)• Provenance and versioning policies• Appropriate use of tools• Computational inaccessibility of free text

annotation• Database accessibility through means other than

point and click web interfaces.Independent of the Grid!

Page 37: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Life Sciences Grid (LSG)

http://people.cs.uchicago.edu/~dangulo/LSG/

Page 38: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

The sum up

• If you ignore the multi-organisational aspect of Grid

• If you ignore the heterogeneous aspect of Grid

• If you assume its safe and free and fair

• Then its not so hard.

Page 39: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

The myGrid Team• Carole Goble• Norman Paton• Alvaro Fernandes• Stephen Pettifer• Luc Moreau• Dave De Roure• Chris Greenhalgh• Tom Rodden• John Brooke• Paul Watson• Alan Robinson• Rob Gaizauskas• Robert Stevens• Neil Wipat

• Matthew Addis• Nick Sharman• Rich Cawley• Simon Harper• Karon Mee• Simon Miles• Vijay Dailani• Xiaojian Liu• Tom Oinn• Martin Senger• Milena Radenkovic• Kevin Glover• Angus Roberts• Chris Wroe

• Mark Greenwood • Phil Lord• Neil Davis• Darren Marvin• Justin Ferris• Peter Li• Nedim Alpdemir• Luca Toldo• Robin McEntire• Anne Westcott• Tony Storey• Bernard Horan• Paul Smart• Robert Haynes

Page 40: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Spares

Page 41: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Knowledge Services

Knowledge-based data/computation

services

Knowledge-based information

services

Data/computation services

Information services

e-Scientist environment

Text miningAnnotation

Base services

Semanticservices

Knowledgeservices

Knowledgeapplications & networks

Collaboratory Prediction

Applications

Resources

Page 42: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Web Portal

Gateway API

Workbench Apps Builder (Talisman)

Custom Application DemonstratorApplication

UTOPIA

WorkbenchDemonstrator

Cold Carp Gene Expression

MSD Sequence annotation

Pro

ven

an

ce

Pers

on

alis

aio

n

Secu

rityBioMedical Services Librarye.g. Distributed Annotation Service

User Agent

Presentation Services

Collaboration Support

Management Tools

Base

Serv

ices

Sem

an

tic

aw

are

serv

ices

Fab

ric

Semantic Data Integration

Provenance metadata

Versioning

QoSDistributed

Query

Database

Provenance Validation & Assessment

MIR Database Access

Workflow Enactment

JobExecution

Semantic Workflow Design

Third Party

Ontology Service

Event Notification

Semantic Discovery

Syntactic Discovery

‘White Pages’ & ‘Yellow Pages’

Discovery

Device Access

Information Extraction

Knowledge

Metadata

Annotation

Preferences

Reasoner

Availability

Service matcher

myGrid Stack

Page 43: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Web Portal

Gateway API

Workbench Apps Builder (Talisman)

Custom Application DemonstratorApplication

UTOPIA

WorkbenchDemonstrator

Cold Carp Gene Expression

MSD Sequence annotation

Pro

ven

an

ce

Pers

on

alis

aio

n

Secu

rityBioMedical Services Librarye.g. Distributed Annotation Service

User Agent

Presentation Services

Collaboration Support

Management Tools

Base

Serv

ices

Sem

an

tic

aw

are

serv

ices

Fab

ric

Semantic Data Integration

Provenance metadata

Versioning

QoSDistributed

Query

Database

Provenance Validation & Assessment

MIR Database Access

Workflow Enactment

JobExecution

Semantic Workflow Design

Third Party

Ontology Service

Event Notification

Semantic Discovery

Syntactic Discovery

‘White Pages’ & ‘Yellow Pages’

Discovery

Device Access

Information Extraction

Knowledge

Metadata

Annotation

Preferences

Reasoner

Availability

Service matcher

myGrid Stack 0.1

Page 44: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Cold Carp Gene Expression

Web Portal

Gateway API

Workbench Apps Builder (Talisman)

Custom Application DemonstratorApplication

UTOPIA

WorkbenchDemonstrator

MSD Sequence annotation

Pro

ven

an

ce

Pers

on

alis

aio

n

Secu

rityBioMedical Services Librarye.g. Distributed Annotation Service

User Agent

Presentation Services

Collaboration Support

Management Tools

Base

Serv

ices

Sem

an

tic

aw

are

serv

ices

Fab

ric

Semantic Data Integration

Provenance metadata

Versioning

QoSDistributed

Query

Database

Provenance Validation & Assessment

MIR Database Access

Workflow Enactment

JobExecution

Semantic Workflow Design

Third Party

Ontology Service

Event Notification

Semantic Discovery

Syntactic Discovery

‘White Pages’ & ‘Yellow Pages’

Discovery

Device Access

Information Extraction

Knowledge

Metadata

Annotation

Preferences

Reasoner

Availability

Service matcher

myGrid Stack 0.2

Page 45: My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble  Sun Microsystems BioGrid Symposium, Baltimore, USA

Service based architecture

Find them

Publication, registration, discovery, matchmaking,

deregistration.

Organise them.

Interoperation, composition, substitution.

Run them.

Execution, monitoring, exception handling.