sys-bio talk, 24 th feb 2005 towards grid-based system biology dr richard sinnott technical director...

33
Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical) Bioinformatics Research Centre University of Glasgow 24 th February 2005

Upload: noel-griffith

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Towards Grid-Based System Biology

Dr Richard SinnottTechnical Director National e-Science Centre

||| Deputy Director (Technical) Bioinformatics Research Centre

University of Glasgow

24th February 2005

Page 2: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Grids? E-Science? E-Research?

sensor nets

Shared data archives

computers

software

colleagues

instruments

Grid

methodologies transforming science, engineering, medicine and business

driven by exponential growth in data, compute demands enabling a whole-system approach

Page 3: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

Southampton

London

Belfast

Daresbury Lab

RALHinxton

NeSC in the UK

NeSC

Core National Grid Service

White Rose Grid

HPC(x)

CSAR

Previous work on UK e-Science Grid based on GT2Demonstrated broad set of applications across it

Monte Carlo simulations of ionic diffusion through radiation damaged crystal structures Integrated Earth system modelling BLAST on the Grid Grid Integration Test Script Suite …

Transition to WSRF/OGSA under discussionTwo UK OGSA Test Grid projects started in January

UCL, Imperial College, Universities of Edinburgh and Newcastle Universities of Portsmouth, Reading, Manchester, Westminster and CCLRC

There are still issues to be resolvedOGSA definition and delivery

Standards OGSI, WSRF, … …and Technologies GT3, GT4…

Hosting environments & PlatformsCombinations of services supportedMaterial and grids to support adopters

Challenges/ Opportunities

?

The next Grid software

Page 4: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Life Sciences

Extensive Research Community>1000 per research university

Extensive ApplicationsMany people care about them

Health, Food, Environment

Interacts with virtually every disciplinePhysics, Chemistry, Maths/Stats, Nano-engineering, …

450+ databases relevant to bioinformatics (and growing!)

Heterogeneity, Interdependence, Complexity, Change, …

Page 5: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Systems Biology?N

ucl

eoti

de

seq

uen

ces

Nu

cleo

tid

e st

ruct

ure

s

Gen

e ex

pre

ssio

ns

Pro

tein

Str

uct

ure

s

Pro

tei n

fu

nct

ion

s

Pro

tein

-pro

tein

inte

ract

ion

(p

ath

way

s)

Cel

l

Cel

l sig

nal

lin

g

Tis

sues

Org

ans

Ph

ysio

logy

Org

anis

ms

Pop

ula

tion

s

+ links to plant/crops, environmental, health, … information sources

Page 6: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

More genomes …...Arabidopsis

thaliana

mouse

rat

Caenorhabitis elegans

Drosophilamelanogaster

Mycobacteriumleprae

Man

Plasmodiumfalciparum

Mycobacteriumtuberculosis

Neisseria meningitidis

Z2491

Helicobacter pylori

Xylella fastidiosa

Borrelia burgorferi

Rickettsia prowazekii

Bacillus subtilis

Archaeoglobusfulgidus

Campylobacter jejuni

Aquifex aeolicus

Thermotoga maritima

Chlamydiapneumoniae

Pseudomonasaeruginosa

Ureaplasmaurealyticum

Buchnerasp. APS

Escherichia coli

Saccharomycescerevisiae

Yersinia pestis

Salmonellaenterica

Thermoplasmaacidophilum

Page 7: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Distributed and Heterogeneous data

LPSYVDWRSA GAVVDIKSQG ECGGCWAFSA IATVEGINKI TSGSLISLSE QELIDCGRTQ NTRGCDGGYI TDGFQFIIND GGINTEENYP YTAQDGDCDV

Sequence Structure Function

Gene expression Morphology

Page 8: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Database GrowthPDB Content Growth

•DBs growing exponentially!!!•Biobliographic (MedLine, …)

•Amino Acid Seq (SWISS-PROT, …)

•3D Molecular Structure (PDB, …)

•Nucleotide Seq (GenBank, EMBL, …)

•Biochemical Pathways (KEGG, WIT…)

•Molecular Classifications (SCOP, CATH,…)

•Motif Libraries (PROSITE, Blocks, …)

Page 9: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Is Grid the Answer? Some key problems to be addressed

Tools that simplify access to and usage of data Internet hopping is not ideal!

Tools that simplify access to and usage of large scale HPC facilities

qsub [-a date_time] [-A account_string] [-c interval] [-C directive_prefix] [-e path] [-h] [-I] [-j join] [-k keep] [-l resource_list] [-m mail_options] [-M user_list] [-N name] [-o path] [-p priority] [-q destination] [-r c] [-S path_list] [-u user_list] [-v variable_list] [-V] [-W additional_attributes] [-z] [script]

Tools designed to aid understanding of complex data sets and relationships between them

e.g. through visualisation

Page 10: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Access to and Usage of Data

Grid technology should allow tohide heterogeneity, deal with location transparency,address security concerns,…

Data Access and Integration Specification (DAIS) being defined by GGF

OGSA-DAI and DAIT projects key role in shaping these standardsOther commercial solutions

IBM Information Integrator, …

Page 11: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Access to and Usage of HPC facilities

Consider whole genome-genome (2*3*10^9 bp) comparisons between two species  

Current strategy essentially chops up one genome and fires searches for those fragments in the other then re-assembles results  

messy approximate matching - re-assembly difficult important correlations can be lost

– to make this tractable so called junk DNA ignored – chopping may introduce artefacts or hide phenomena

Better to put both full genomes in memory and perform a useful complete comparisonOnly possible with very high-end machines (available via grids)Should not have to be script writer/Linux sys-admin to use these facilities

Page 12: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Cognitive aspects of Data

Life science data can be “ugly”Raw data sets messyRequires significant effort to understandSchemas/data models evolving…

Tools needed to Simplify understandingImprove analysisNavigate through potentially huge data sets

e.g. to find genes of interest in chromosomes of different species

Page 13: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Nu

cleo

tid

e se

qu

ence

s

Nu

cleo

tid

e st

ruct

ure

s

Gen

e ex

pre

ssio

ns

Pro

tein

Str

uct

ure

s

Pro

tei n

fu

nct

ion

s

Pro

tein

-pro

tein

inte

ract

ion

(p

ath

way

s)

Cel

l

Cel

l sig

nal

lin

g

Tis

sues

Org

ans

Ph

ysio

logy

Org

anis

ms

Pop

ula

tion

s

BRIDGESSBRN VOTES

DyVOSE

GHI

JDSS

Page 14: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Overview of BRIDGES

Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES)

NeSC (Edinburgh and Glasgow) and IBM Started October 2003

Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases

Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …

Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, …

Aim is integrated infrastructure supportingData federationSecurity

Page 15: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Bridges Project

Glasgow Edinburgh

Leicester Oxford

London

Netherlands

Publically Curated Data

Private data

Private data

Private data

Private data

Private data

Private data

CFG Virtual Organisation Ensembl

MGI

HUGO

OMIM

SWISS-PROT

… DATA HUB

RGD

SyntenyService

Information Integrator

OGSA-DAI

Magna Vista Service

VO Authorisation

blast

+ + +

Page 16: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

JDSS Project

Public data resources opennessOften cannot query directly Often not easy/possible to find schemasJoint Data Standards Study investigating this

Started on 1st June and involves– Digital Archiving Consultancy– Bioinformatics Research Centre (Glasgow)– NeSC (Edinburgh and Glasgow)

Look at technical, political, social, ethical etc issues involved in accessing and using public life science resources

– Interview relevant scientists, data curators/providers 8 month project with final report due imminently

– Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI

Page 17: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Dynamic Virtual Organisations for e-Science Education (DyVOSE) project

Two year project started 1st May 2004 funded by JISCExploring advanced authorisation infrastructures for security

… in Grid Computing Module as part of advanced MSc at Glasgow– Provide insight into rolling Grid out to the masses! ScotGrid

Authorisation decisions

Authorisation checks

PERMIS based

authorisation

Education VO policies

GU Condor pool

Other (known!) Grid resources

DyVOSE Project

Page 18: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Scottish Bioinformatics Research Network

Four year proposal expected to start imminently

Funded (£2.4M) by Scottish Enterprise, Scottish Higher Education Funding Council, Scottish Executive Environment and Rural Affairs Department

Involves Glasgow, Dundee, Edinburgh, Scottish Bioinformatics Forum

Aim to provide bioinformatics infrastructure for Scottish health, agriculture and industry

Infrastructure support at Dundee, Edinburgh and Glasgow to support first-rate research in bioinformatics at each academic institute

Infrastructure support at three institutes, to support inter-institutional sharing of compute and data resources through application of Grid computing

Outreach and training activities mediated by the Scottish Bioinformatics Forum

Page 19: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

VOTESVirtual Organisations for Trials and Epidemiological Studies

3 year MRC (£2.8M) funded project expected to start imminentlyPlans to develop Grid infrastructure to address key components of clinical trial/observational study

Recruitment of potentially eligible participants Data collection during the study Study administration and coordination

– Involves Glasgow, Oxford, Leicester, Nottingham, Manchester

Clinical Virtual Organisation Framework

IMP

CVO-2 (e.g. for

recruitment)

Used to realise

GPs

Lei- Nott GLA

OX

Disease registries

Hospital databases

Transfer Grid

CVO-1 (e.g. for data collection)

Clinical trial data sets

Page 20: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Genetics and Healthcare Initiative

Five (2+3) year proposal (£4.4M) expected to start imminently

Funded by Health Department and Department for Enterprise and Lifelong Learning

Involves Glasgow, Dundee, Edinburgh, Aberdeen

– focus of genetics as applied to healthcare

– first two years emphasis on providing a platform for research into the genetic basis of common complex diseases in Scotland

» Mental health, cardiovascular, … » Plan to establish 15,000 family-based intensively-phenotyped cohort

recruited from the East and West of Scotland

– basis for neutralising heritable (genetic) risk factors in disease surveillance, treatment optimisation, avoidance of adverse drug events and prediction of response to therapy, health care planning and drug discovery, …

Page 21: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Systems Biology?

Once we have (securely) connected all relevant data sets and simplified access to and usage of HPC resources, wrapped your favourite bioinformatics applications as Grid services...

what questions would you like to ask?– How does a cell work?– Why do people who eat less tend to live longer?– How many people across Scotland had a heart attack in the

last 5 years took drug X, and of those that did where genes A or B influenced by this drug?

– Who has performed an experiment similar to mine and where their results similar?

– …

Page 22: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

www.nesc.ac.uk

Page 23: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

www.nesc.ac.uk

Page 24: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Bridges Portal

Page 25: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

www.nesc.ac.uk

MagnaVista

Page 26: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

MagnaVista

Page 27: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

QTL upload

Page 28: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

QTL upload

Page 29: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

QTL browsing

Page 30: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Grid Blast Client

• Allows ‘genome scale’ blasting

• Uses ScotGrid and idle compute resources of training lab Condor pool

Page 31: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Page 32: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005

Page 33: Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Sys-Bio Talk, 24th Feb 2005