4 th annual epsrc e-science meeting the need for e-science an industrial perspective stephen calvert...
TRANSCRIPT
4th Annual EPSRC e-science meeting
The need for e-Science
An industrial perspective
Stephen Calvert – VP Cheminformatics GSK Yike Guo – Imperial College
4th Annual EPSRC e-science meeting
What is the “industrial” world like?• Historically
– Low volume
• 30-50 cmpds/yr/chemist: 10,000s assay wells/yr– Low information diversity
• scientists generally dealt with limited types of data– reductionist approach
• limited information per experiment– Interpretation critical fro next step
• scientists required:– simple systems to assist in information monitoring– decision making resides with the scientist
4th Annual EPSRC e-science meeting
What is the “industrial” world like?• What happened in the last 5 years?
– “industrialisation” - Application of “principles of industrialisation” to drug discovery
• high volume– 10,000 cmpd/yr/chemist/100+ million wells/yr
– biology revolution• Human genome
– “system biology” – holistic view and interpretation– high content data --- images– multiple result types from each experiment – bio-markers, pathways
– knowledge integration• scientific discipline integration
– scientists required:• complex systems, algorithms, statistics…….• decision making shared between systems and scientists• “Informatics” essential – partnership not service
4th Annual EPSRC e-science meeting
How have we (IT) tackled the transition?
• Business as usual– problem centric view
• build applications• integrate applications
• Educate scientists in the realms of IT– “Now I need to be an IT expert alongside chemistry, biology,
genetics, robotics, engineering ……”– interesting time scale - generations
• Technology is our saviour!– client server, web services, java, C#, Corba, OO programming,
extreme programming, grid computing, …..
4th Annual EPSRC e-science meeting
What are the results?chemistry
• “islands” of process & data– complex integration problem
• “spaghetti” joins our worlds - unsustainable - cost
• control with “IT”– mismatch in cycle time to change– engineered out serendipity– service role reversed
infrastructure
Minicomputer
Minicomputer
Minicomputer
samples
HayStackstores
Tube StoreManager
ProcessControl
Manager(PinPoint)
manualstore
Manual StoreManager
Weighing
Balance
Balance
Dissolve Sort
H'Ware H'Ware
HayStackstores
Solid StoreManager
H'Ware
Other...
client ordercomponent
availabilitycomponent
Sample Holding Area
submissioncomponent
Booking-in Manager
Processing Queue
WorksOrder
ProcessingManager
Job Queue
samples:client - scientistclient - remote cmpd bank
DispatchManager
Sample Holding Area
ALS system
ALSManager (RTS)
Stock RecordDatabase
DiscoveryStock
Warehouse
DiscoverySample History
Warehouse
sample historycomponent
GSK Applications
User interface component
Database
Physical queue
Electronic queue
Automation Hardware
screening
“library” designdata
4th Annual EPSRC e-science meeting
How could we do it differently?
• result in:– handing control of science back to the scientist– match cycle times to change– Simplify
• how can we merge the 2 worlds?– physical, information
4th Annual EPSRC e-science meeting
Doodling in knowledge and experiment space
• no predefined steps• capture what was done don’t
restrict what can be done?• don’t restrict the non-obvious
Information ResourcesInformation ResourcesTargetList &Status
TargetLeads
IC50
Assay
ExclusionLists
StructureValidationOther
Assay...
Q: - are these results real?
Q: - what do I know about these compounds?
Q: - what other data can I acquire?
Q: - what other data can I acquire?
this is workflow – isn’t it?
physical & information worlds merge
4th Annual EPSRC e-science meeting
Doodling in knowledge & experiment space
• Need access to world-class scientific algorithms and tools• Need access to disparate data sources from multiple locations• Intuitive & flexible GUI design/analysis• Framework needs to be very generic • Ability to construct a “just-in-time” application• Need to serving the requirements of a varied user community
– both in terms of scientific and technical know-how
• Capture and dissemination of “Best practice” within a creative environment to enhance efficiency company wide
4th Annual EPSRC e-science meeting
Discovery Net Overview
• Funding : – One of the Eight UK National e-Science Projects (£2.4 M)
• Key Features:
– Allow Scientists to Construct, Share and Execute Complex Knowledge Discovery Processes & Services
– Allow Institutions to Manage and Utilise the Compositional Services as its Intellectual Properties
• Applications:– Life Science– Environmental Modelling – Geo-hazard Prediction
• Achievement :– For the First time Discovery Net Realises the
Dynamic Construction of Compositional Services on GRID for Real Time Knowledge Discovery and Decision Making
•
• Goal : Constructing the World’s First Infrastructure for Global Wide Knowledge Discovery on the Grid of Web Services
Using GRID Resources
ScientificInformationScientific
InformationScientific Discovery
In Real Time
LiteratureLiterature
DatabasesDatabases
OperationalData
OperationalData
ImagesImages
InstrumentData
InstrumentData
Real Time Data Integration
Dynamic ApplicationIntegration
Discovery Services
Process Knowledge Management
Workflow = Compositional Service
4th Annual EPSRC e-science meeting
Enterprise Wide Integrative Scientific Decision
Making Platform with Discovery Net Workflow • Constructing a ubiquitous
workflow : by scientists– Integrate information resources/software applications
cross-domain – Support innovation and capture the best practice of
your scientific research
• Warehousing workflows: for scientists
– Manage discovery processes within an organisation– Construct an enterprise process knowledge bank
• Deployment workflow: to scientists
– Turn a workflows into reusable applications/services– Turn every scientist into a solution builder
4th Annual EPSRC e-science meeting
An Integrative Analysis Example:Interactive&Interactive Scientific
Discovery with Workflow
Relational data miningRelational
data mining
Text miningText mining
Spectrum data miningSpectrum
data mining
Chemical sequence
data model
Chemical sequence
data model
Visualizingrelational
data clusters
Visualizingrelational
data clusters
Visualizingmultidimension
al data
Visualizingmultidimension
al data
Visualizingsequence data
Visualizingsequence dataVisualizing
pathway dataVisualizing
pathway dataText mining visualizationText mining visualization
Visualizing cluster
statistics
Visualizing cluster
statistics
Visualizing serial/spectru
m data
Visualizing serial/spectru
m data
Decision tree model of
metabonomic profile
Decision tree model of
metabonomic profile
Chemical structure
visualization
Chemical structure
visualization
Relational data miningRelational
data mining
Text miningText mining
Spectrum data miningSpectrum
data mining
Chemical data modelChemical
data model
4th Annual EPSRC e-science meeting
Discovery Net Commercialisation
Discovery Net ResearchCS : Workflow for Informatics on SOA
Sensor : Sensor Data Processing and MiningApplication : Life, Environmental and Geo-physical Sciences
DeltaDot
Research :
Commercialisation (Imperial College Spin Out Companies):
Workflow technology HT sensor processing
KDE Informatics Platform Label Free HT bioSensors
Life Science Industry
4th Annual EPSRC e-science meeting
library design - GSK• Process of selecting the molecules I want to make from the universe
of molecules
• Toolbox: scientific models, chemical handling, chemical properties, data access, statistics, data visualisation, ….
• Scientists can doodle in chemical space– Capture how scientists made decisions
• New algorithms, data sources added in < 1 hour
4th Annual EPSRC e-science meeting
The 2003 SARS outbreak
KDE Example2 : SARS Genome Annotation
Relationship between SARS and other virus
Mutual regions identification
Homology search against viral genome DB
Annotation using Artemis and GenSense
Gene prediction
Phylogenetic analysis
Exon prediction
Splice site prediction
Immunogenetics
Multiple sequence alignment
Microarray analysis
Bibliographic databases
Key word search
GeneSenseOntology
D-Net:Integration,
interpretation, and
discovery
Epidemiological analysis
Predicted genes
SARS patients diagnosis
Homology search against protein DB
Homology search against motif DB
Protein localization site
prediction
Protein interaction prediction
Relationship between SARS
virus and human receptors prediction
Classification and secondary structure prediction
Bibliographic databases
Genbank
Annotation using Artemis and GenSense
China SARS Virtual Lab based onDiscovery Net
Achievement: Dynamic Construction of Compositional Services:
Rapid construction of applications via composition of existing web services using workflow.
Instant deployment of analytical workflows as new web services with resource mapping.
Integrated workflow, provenance and service management
Collaborative construction of workflows by large numbers of researchers
Requirements:
Rapid constructing and sharing mission critical discovery services
Integration of diverse bioinformatics applications
Support collaborative research between geographically distributed researchers
Deploying services as easy to use tools for real time decision making
4th Annual EPSRC e-science meeting
Compositional Services for SARS Mutation Analysis
50 data resource> 200 software applications and servicesDesigned on top of the web service environmentUsed by more than 200 scientistsResult published in <<Science>>
4th Annual EPSRC e-science meeting
Future Challenge:GSK- InforSense & IC e-Science Collaboration
• Workflow Fusion : Applying advanced performance programming technology for dynamic optimization of workflow execution
• Workflow Abstraction : Investigating abstraction mechanisms for building workflow hierarchy and higher order composition forms
• Dynamic Service Composition: Investigating service ontology for dynamic composing services with workflow
• Workflow Metadata Model : Building up a generic meta data model for scientific workflow management and workflow warehousing
• Man – machine interface – free scientists from IT speak
4th Annual EPSRC e-science meeting
How can you help?• encourage focused research in key issues SCIENTISTS facing
in industries • catalyst the joint work in these focused fields between
academics, industry and commercial software vendors • facilitate the solution-oriented communication between
computer scientists and domain scientists in both academic and industry
4th Annual EPSRC e-science meeting
e-Science
• A politician's view:‘[The e-Science platform ] intends to make access to computing
power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’
Tony Blair
• A Scientist’s View:[The e-Science platform ] should help me to do my scientific
research free from the complexity of IT