e-science tools for the genomic scale characterisation of bacterial secreted proteins tracy...

23
e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle University

Upload: leslie-stafford

Post on 01-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins

Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat

Newcastle University

Page 2: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 3: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Computational Challenges of Bioinformatics

New requirements from bioinformatics 3 major problems

Heterogeneity Distribution Autonomy

Experiments - series of workflows

Page 4: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

myGrid and Taverna

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

Freefluo Workflow engine to run workflows

Freefluo

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

Page 5: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Microbase

Grid-based system for microbial genome comparison and analysis

Information repository (and execution environment)

Pre-computed data

Page 6: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 7: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Secretion in Bacillus

Predict characteristics & behavior of bacteria

Identify secreted proteins

Bacillus species diverse behaviour Soil inhabitants Harmful bacteria

Page 8: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Importance of Secretion

Mechanism of interaction with environment

Reveal capabilities of an organism

Pathogens are of great interest

Page 9: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Secretory Proteins

Cytoplasm

Medium

Membrane

Cell Wall

Signal Peptide

Lipoprotein

Cell wall binding

Transmembrane

LPXTG

Page 10: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 11: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Bioinformatic Tools

Cytoplasm

Medium

Membrane

Cell Wall

Signal Peptide

Lipoprotein

Cell wall binding

Transmembrane

LPXTG

Signalp

TMHMMtmap

MEMSATLipoP

ps_scan

Page 12: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Classification Workflow

Page 13: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Process of Analysis

01

02

03

04

0

CP

00

00

01

AE

01

73

55

AE

01

72

25

AE

01

73

34

AE

01

68

79

AE

01

71

94

AE

01

68

77

AL

00

91

26

CP

00

00

02

AE

01

73

33

AP

00

66

27

BA

00

00

04

Putative secreted proteins

Protein families

Functional classification Relations

Page 14: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Analysis Workflow

Page 15: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Architecture

Custom-designed database Provenance tracking Analysis – computationally intensive Architecture differs from other systems

Page 16: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Web Portal

Page 17: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 18: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Classification Results

Page 19: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Similar to unknown proteinsTransport/binding proteins and lipoproteinsCell WallMembrane bioenergeticsGerminationProtein secretionSporulationMetabolism of carbohydrates and related moleculesSpecific pathwaysTransformation/competenceMetabolism of lipidsMetabolism of phosphateTranscription regulationMetabolism of amino acids and related molecules

02

46

81

01

2

Functions of the Clusters

Num

ber

of

fam

ilies

Page 20: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Biologist’s Outlook

Results available for subsequent analysis

Data and results are of great interest

Page 21: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

eScientist’s Outlook

Microbase simplified data analysis

But … Autonomy - most services

provided originally by external parties

Licensing – limits exposure of services

Distribution - difficulty came from the relatively large datasets

Page 22: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Future Enhancements

Use notification to automatically analyse recently annotated genomes

Migrate workflows to a remote enclosed environment?

Page 23: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle

Acknowledgments

Phillip Lord Colin Harwood Anil Wipat

myGrid Carole Goble Tom Oinn

… and the rest of the myGrid team

Microbase Yudong Sun Anil Wipat Matthew Pocock Pete A. Lee Paul Watson Keith Flanagan James T. Worthington