grid engineering experience & biological applications dr richard sinnott

19
Grid Engineering Experience & Biological Applications Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow 28 th May 2004

Upload: lara-wilson

Post on 30-Dec-2015

22 views

Category:

Documents


0 download

DESCRIPTION

Grid Engineering Experience & Biological Applications Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow 28 th May 2004. NeSC Prof Malcolm Atkinson (Director) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Grid Engineering Experience & Biological Applications

Dr Richard Sinnott

Technical Director National e-Science Centre|||

Deputy Director Technical Bioinformatics Research Centre University of Glasgow

28th May 2004

Page 2: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

Southampton

London

Belfast

Daresbury Lab

RALHinxton

NeSC in the UK

NeSC

NeSC Prof Malcolm Atkinson (Director) Dr Richard Sinnott (Technical Director - Glasgow)

NeSC and UK Grid Engineering Background Achievements Current/future

Life sciences & Grids Challenges & Opportunities Life science projects involving NeSC Glasgow

– Bridges (Security focused Grid infrastructure for CFG)– Scottish Bioinformatics Research Network (coming soon)– JDSS (data sharing for life sciences)– VOTES…?

Core National Grid Service

White Rose Grid

HPC(x)

CSAR

Previous work on UK e-Science Grid based on GT2Demonstrated broad set of applications across it

Monte Carlo simulations of ionic diffusion through radiation damaged crystal structures Integrated Earth system modelling BLAST on the Grid Grid Integration Test Script Suite …

Transition to OGSI/OGSA under discussionTwo UK OGSA Test Grid projects started in January

UCL, Imperial College, Universities of Edinburgh and Newcastle Universities of Portsmouth, Reading, Manchester, Westminster and CCLRC

There are still issues to be resolvedOGSA definition and delivery

Standards OGSI, WSRF, … …and Technologies GT3, GT4…

Hosting environments & Platforms

Combinations of services supported

Material and grids to support adopters

Page 3: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Glasgow e-Science Hub E-Science Hub

Externally Glasgow end of NeSC

– Involved in UK wide activities» ETF: In May 2003 became first UK e-Science Centre to run

integration tests across every site of the UK (Level 2) Grid. Therefore 100% access to UK Grid resources at this time

– Public visibility of NeSC» responsible for NeSC web site

Internally Focal point for e-Science research/activities at Glasgow Work closely with foundation departments

– Department of Computing Science– Department of Physics & Astronomy

Also working closely with other groups including– Bioinformatics Research Centre– Electronics and Electrical Engineering– Biostatistics– …

Page 4: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Glasgow e-Science Activities

Consolidating resourcesBuilding around ScotGrid

Providing shared Grid resource for wide variety of scientists inside/outside Glasgow

– Particle physicists, computer scientists, bioinformaticians, …

» Target shares established

– Focal point for e-Science at GlasgowHardware• 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet • 1TB disk • LTO/Ultrium Tape Library • Cisco ethernet switchesNew..• IBM X Series 370 PIII Xeon with 32 x 512 MB RAM • 5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD• eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory• eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory • CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory• CDF 7.5TB Raid disk

Shared Resources: Disk ~15TBCPU ~ 330

1GHz CDF

LHCBIO

Page 5: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Grids & Life SciencesExtensive Research Community

>1000 per research university

Extensive ApplicationsMany people care about them

Health, Food, Environment, …

Interacts with many disciplinesPhysics, Chemistry, Maths/Statistics, Nano-engineering, …

Huge and expanding number of databases relevant to bioinformatics community

Heterogeneity, Interdependence, Complexity, Change, Dirty…

Linking using in co-ordinated, secure manner full of open issues to be addressedCompute demands growing as more in-silico research undertaken

Page 6: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Database GrowthPDB Content Growth

•DBs growing exponentially!!!•Biobliographic (MedLine, …)

•Amino Acid Seq (SWISS-PROT, …)

•3D Molecular Structure (PDB, …)

•Nucleotide Seq (GenBank, EMBL, …)

•Biochemical Pathways (KEGG, WIT…)

•Molecular Classifications (SCOP, CATH,…)

•Motif Libraries (PROSITE, Blocks, …)

Page 7: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

More genomes …...Arabidopsis

thaliana

mouse

rat

Caenorhabitis elegans

Drosophilamelanogaster

Mycobacteriumleprae

Vibrio cholerae

Plasmodiumfalciparum

Mycobacteriumtuberculosis

Neisseria meningitidis

Z2491

Helicobacter pylori

Xylella fastidiosa

Borrelia burgorferi

Rickettsia prowazekii

Bacillus subtilis

Archaeoglobusfulgidus

Campylobacter jejuni

Aquifex aeolicus

Thermotoga maritima

Chlamydiapneumoniae

Pseudomonasaeruginosa

Ureaplasmaurealyticum

Buchnerasp. APS

Escherichia coli

Saccharomycescerevisiae

Yersinia pestis

Salmonellaenterica

Thermoplasmaacidophilum

Page 8: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Complexity of Biological DataN

ucl

eoti

de

seq

uen

ces

Nu

cleo

tid

e st

ruct

ure

s

Gen

e ex

pre

ssio

ns

Pro

tein

Str

uct

ure

s

Pro

tei n

fu

nct

ion

s

Pro

tein

-pro

tein

inte

ract

ion

(p

ath

way

s)

Cel

l

Cel

l sig

nal

lin

g

Tis

sues

Org

ans

Ph

ysio

logy

Org

anis

ms

Pop

ula

tion

s

+ links to plant/crops, environmental, health, … information sources

Fascinating scientific questions

•Why do mice, worms, humans… live longer if they eat less?•How does the brain work?•Why do we stop growing?

Page 9: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Bioinformatics Grid Needs

Taken from C. Goble myGrid presentation

Workflow / Virtual Organisation Needs

OGSA_DAI/DAIT, IBM Information

Integrator, … Single sign onauthentication,Granularity of authorisation

National Data Curation Centre

(GU,EU,UKOLN, CCLRC)

BioInf community,Database schemas, …

UDDI repositories,

BioInf portals, …

Grid engineering (scheduling, resource reservation, workflow

enactment, …)

WSDL descriptions,

Semantic grid, …

Page 10: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Bio e-Science Projects

Page 11: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Overview of BRIDGES

Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES)

NeSC (Edinburgh and Glasgow) and IBM

Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases

Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …

Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …

Aim is integrated infrastructure supportingData federationSecurity

Page 12: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Bridges Project

Glasgow Edinburgh

Leicester Oxford

London

Netherlands

Publically Curated Data

Private data

Private data

Private data

Private data

Private data

Private data

CFG Virtual Organisation Ensembl

MedLine

GenBank

OMIM

SWISS-PROT

DATA HUB

SyntenyGrid

Service

blast

+

Page 13: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

To sequence To multiplealignment

To tabularsummaries

DRILL-DOWN FUNCTIONS

Future tools available via Portal

Page 14: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Where we are today!

Information Integrator DB repository established and populated

… with public data sets… linking to relevant resources (ensembl…)

GT3 based Grid services developed (BLAST, …)General usage of ScotGrid

(solution being re-engineered with help from eDIKT - will include Condor pool)

Initial portal developed using IBM WebSphereGenome visualisation browsers

SyntenyVista – for viewing synteny between local/remote data setsMagnaVista – for exploring genetic information across multiple (remote) resources

Gaining experience with security technologiesSetting up policies with Grid security authorisation software etc

Initial roll-out to CFG planned for 4th June

Page 15: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Lessons learnt

Public data resources opennessOften cannot query directly Often not easy/possible to find schemasJoint Data Standards Study investigating this

Starts on 1st June and involves– Digital Archiving Consultancy– Bioinformatics Research Centre (Glasgow)– NeSC (Edinburgh and Glasgow)

Look at technical, political, social, ethical etc issues involved in accessing and using public life science resources

– Will liase with NDCC– Interview relevant scientists, data curators/providers

8 month project with final report in January– Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI

GT3 not without pain! Hopefully GT4 will be better?

Page 16: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Scottish Bioinformatics Research Network

Four year proposal starting imminentlyFunded by Scottish Enterprise, Scottish Higher Education Funding Council, Scottish Executive Environment and Rural Affairs Department

Involves Glasgow, Dundee, Edinburgh, Scottish Bioinformatics Forum

Aim to provide bioinformatics infrastructure for Scottish health, agriculture and industry

Infrastructure support at Dundee, Edinburgh and Glasgow to support first-rate research in bioinformatics at each academic institute

Infrastructure support at three institutes, to support inter-institutional sharing of compute and data resources through application of Grid computing

Outreach and training activities mediated by the Scottish Bioinformatics Forum

Page 17: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

VOTES

Plans to develop Grid infrastructure to address key components of clinical trial/observational study

Recruitment of potentially eligible participantsData collection during the studyStudy administration and coordination

Involves Glasgow, Oxford, Leicester, Nottingham, Manchester Hopefully to be funded in August 2004 by MRC

Clinical Virtual Organisation Framework

IMP

CVO-2 (e.g. for

recruitment)

Used to realise

GPs

Lei- Nott GLA

OX

Disease registries

Hospital databases

Transfer Grid

CVO-1 (e.g. for data collection)

Clinical trial data sets

Page 18: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

Summary

NeSC Glasgow establishing itself as leading centre in

Grid Security Authentication, authorisation, usability

Data access and integration Working closely with NeSC Edinburgh (OGSA-DAI, DAIT, ELDAS)

Education Developing Grid Computing courses in advanced MSc at Glasgow

– DyVOSE project» Two year project started 1st May » Grids & security to the masses!

Life sciences focal point for NeSC Glasgow Close liaison with

– Bioinformatics Research Centre (Prof David Gilbert)– Biostatistics (Prof Ian Ford)

… others?

Page 19: Grid Engineering Experience &  Biological Applications Dr Richard Sinnott

www.nesc.ac.uk