e-infrastructure landscape: a researcher’s viewpoint. carol… · • support any device and any...
TRANSCRIPT
e-Infrastructure landscape:
A Researcher’s Viewpoint
Professor Carole Goble FREng, FBCS CITP
University of Manchester, UK
Open Middleware Infrastructure Institute
Software Sustainability Institute UK
RUGIT (Russell Universities Group IT Directors,
Imperial College, London 28 March 2012
Why me? • University-based e-Infrastructure researcher, developer
and service provider: workflow management, catalogues, data/model/workflow repositories, VREs, catalogues, data collection, ontologies, publishing
• Mixed team of CS researchers, scientific informaticians, software engineers and IT Services: 1 seconded from ITS, 2 working with us.
• e-Infrastructure provider to large-scale projects: biodiversity, astrophysics, astronomy, chemistry, genomics, digital document preservation, systems biology, social science …
• e-Infrastructure provider to long tail researchers: biology, genomics, chemistry, social science, archaeology, music, …
Why me?
• Open Middleware Infrastructure Institute UK and Software Sustainability Institute, ESNW e-Science Centre with ITS@Manchester.
• Chaired expert panel for BIS “Delivering the UK’s e-Infrastructure for research and innovation”, 2010
• Leading a Think Tank Tea Club for academic-lead consultation on a strategy for e-Infrastructure for Research
• Chair UK User Forum for ELIXIR ESFRI
What is e-Infrastructure? The integration of digitally-based technology, resources,
facilities, and services combined with people and
organizational structures needed to support modern,
collaborative research (and teaching).
1. Data and Storage
2. Software (and Algorithms)
3. Hardware (Compute)
4. Networks
5. Security and authentication (BIS Report)
6. People (Collaboration, Skills, Capacity)
7. The Digital Library
e-Infrastructure Tiers Researcher and Community Specific Applications
Visualisation tools, Simulation tools, Research Tools etc
Stuff we want to be there and buy into
Compute, Storage, Networks, Backup, Service Hosting
Library Services, File and Don’t Forget Data Store,
Institutional Repository, Security Services
Core
Base
Specific
Com
munity
Specific
Com
modity, m
aybe C
usto
mis
ed
Pan-Application and Possible Pan
Community Specific Infrastructure
Often community specific.
Maybe even developed by your researchers
R and Matlab, Specialist Data Management,
Workflow Management, CDK Toolkit for Chemistry
LIMS, ELN, Catalogues and Repositories
Fundamental Expectations
of Institutional IT Support • Lots and lots of storage.
• Backup and fast recovery.
• Fast and reliable network.
• Wireless everywhere.
• Compute of different kinds that works with my tools and not too many hoops, if any.
• Reliable Service hosting.
• Support any device and any operating system.
• Availability of skilled and helpful staff when you need them. Preferably known.
• Ability to use my community’s/Labs e-infrastructure
Political Landscape
Landscape of UK
e-infrastructure Reviews
BIS, RCUK Delivering the UK’s e-
Infrastructure for research and
innovation, 2010
RCUK & RS Review of e-Science 2009
OSI Developing the UK’s e-Infrastructure for science and innovation, 2007
BIS, RCUK, HEFCE, DEL, SFC, Report of
the e-Infrastructure Advisory Group, 2011
BIS A Strategic Vision for UK e-Infrastructure, 2011
Strategy for the UK Research Computing Ecosystem, 2011
BIS Technology and Innovation Futures
Foresight Report, 2010
JISC Review 2010
Coordination
Capacity
Data
• Coordination across research councils.
• Changing behaviours to reward and enable reuse.
• Overcoming fragmentation.
http://www.rcuk.ac.uk/documents/research/esci/e-Infrastructurereviewreport.pdf
UK Research Computing
Ecosystem
Campus/ Regional
HPC e.g HPC-Wales
Special purpose
HPC e.g. DiRAC
National HPC
e.g. HECToR Public data
analysis
cloud
Thematic
petabyte
data store
Specialist
databases
Software
development
HPC
Enhance
current DTCs
Software
development
E-science
Open and
accessible
JANET
Cyber-
security
e-Infrastructure Leadership Council
• The additional £145M of capital investment in e-infrastructure recently announced by BIS builds on the initial outputs from the “A Strategic Vision for UK e-Infrastructure” report.
• Co-chaired by Dominic Tildesley and David Willett
• A 10 year roadmap for investment for networks, data and storage, compute, software and algorithms, people and skills and security and authentication.
• BIS dominated. e-Infrastructure for business
• HPC and Hartree oriented. Strong commercial interest. Concerns about representation and vested interests.
• Keen on specialist centres
• “Today to out-Compute is Out-Compete”
Rear-guard Actions
• RCUK e-Infrastructure subgroup
– Funding councils, championed by Doug Kell (BBSRC)
• e-Infrastructure academic user forum
– Academic lead, Chairs: Prof David De Roure (Oxford) and Prof Peter Coveney (UCL)
– Community thought leadership group
– Concerns about the constitution of the e-Infrastructure Leadership Council
e-Infrastructure
Academic User Forum • e-Infrastructure academic user forum
– Concerns about the constitution of the e-Infrastructure Leadership Council
– Builds on earlier e-Science Forum and HPC-SIG
– A whole-community exercise
• new Digital Research conference – St Catherine's College in Oxford 10-12th September.
– Showcasing successful digital research, tools and methods, and building community especially around big data, open data, open science and the next generation of digital researchers.
RCUK consultation for a Capital
Investment Roadmap • E-infrastructure investment
– fast turn round on university clusters – opportunistic in
year spend from other parts of government
– Speed which the community mobilised seen very
favourably.
– RUGIT now needs to articulate future capital needs
through the RCUK consultation on capital for RCUK
and HEI to be ready if some more funding is found.
– Is RUGIT lined up with your HEIs?
– http://www.rcuk.ac.uk/research/Infrastructure/capcons
t/Pages/home.aspx
RUGIT Help?
• Participate in the Academic User Forum
• Lobby e-Leadership Council
• Remember its not all flippin’ HPC and
RUG Institutions are where much of the
research is done.
• Respond to the CIR
• Be prepared to respond to funding
councils
EPSRC Software Strategy
• Software as an Infrastructure – Survey, response, action plan
– http://www.epsrc.ac.uk/SiteCollectionDocuments/other/SoftwareAsAnInfrastructure.pdf
• Areas – Identification of new areas and grand challenges
– Enabling and promoting collaboration
– Research and Development
– Training
– Career Path Support
– Joint funding models
– Supporting Innovation
– User Support
– Quality of Code
– Sustainability of Code
European e-Infrastructure The mood is reuse and cooperation.
• e-IRG e-Infrastructure Reflection Group http://www.e-irg.eu
• Siena Standards and Interoperability for eInfrastructure and Implementation Initiative http://www.sienainitiative.eu
• Horizon 2020 http://ec.europa.eu/research/horizon2020
• Still keen on Grid and HPC though have finally embraced Cloud
• GEANT: very successful network-level e-infrastructure.
• EGI: now well established European Grid e-infrastructure – EGI faces challenges due to lack of standardisation, a focus on a small
number of application domains and financing.
– Grid stuff is messy: National infrastructures, numerous middleware types, disconnect between EMI and EGI.
– Recent spin out for data – e.g. EUDAT (eudat,eu) though a bit archive focused
• PRACE: HPC infrastructure for Europe – very successful community but struggling to transition to a cash
contribution model from a resource contribution model.
European e-Infrastructure
• Usual obsession with Scale
– Exascale Challenge
• CRESTA, DEEP, Mont-Blanc
• combined funding of 25 million
• different aspects of the exascale challenge “using
a co-design model spanning hardware,
systemware and software applications”.
• Centres of Excellence
– European Software Centres of Excellence
Community focused
e-Infrastructure • Own services, catalogues, data management
environments, datasets, tools and metadata standards, tools, policies, licensing conventions….
• Large number of EU community specific e-Infra: – BioVeL for BioDiversity
– SCAPE for Digital Libraries
– ScalaLife (Comp Bio), MAPPER, VPH…blah blah….
• Flagships, ESFRI projects, Innovative Medicine Initiatives, e.g. OpenPHACTS
• National e.g. UKDA
• International community: e.g. BioStar, Open Bioinformatics Federation, BioSharing, COMBINE, PubMed….
European Strategy Forum on
Research Infrastructures
ELIXIR, http://www.elixir-europe.org/
• Data Access, Curation and Integration – managing the data deluge
– integrating the data to reduce fragmentation of effort and research
– incorporating and exploiting new types of data
– maintaining open access to biological data in order to enhance competitiveness and innovation.
– radically enhancing Europe’s data infrastructure and make it more accessible.
– presenting users with a single, transparent interface to a world of resources
Hub and Spoke Model
European Bioinformatics Institute
1500 databases in
public use.
> 2000 web services
Ecology and BioDiversity • DataONE Investigators toolkit: Dryad data
repository, Workflow tools, Excel tools, R and
MatLab libraries and tools, Specialist Data
Management tools, Bibliography tools, remote
file management tools, data management
planning tools, software tools catalogue
• BioVeL infrastructure: workflows, data
management, web services, portals, catalogues,
repositories, Excel tools
• CAMERA 2.0: portal, workflows, services,
metadata management, data management… http://camera.calit2.net
http://www.dataone.org/investigator-toolkit
http://www.biovel.org
Find, exchange and
interlink, preserve,
publish data, models,
publications, SOPs&
analyses.
User access control.
Mix local and central
stores. Post-project
archiving & publication.
Gateway to
public tools
and resources
Standards
compliant.
Launch and validate
models and analyses:
JWS Online
Find experts,
colleagues
and peers
Group, Project, Consortium,
Community Sharing
Personal Storage
http://www.sysmo-db.org
Faciliting/leveraging Institutional
e-Infrastructure and know-how
• Many research teams have e-Infrastructure and technically skilled people. How do you leverage, enable and exploit them? And vice versa?
Commercial or Open
Cloud-type e-Infrastructure • PAYG Compute
– Amazon Cloud and Azure
• Cluster computing – Condor
• Software repositories – Github, BitBucket
• Data Management – Dryad, DataVerse,
• Data Commons – FigShare
• Specialist repositories & Catalogues
– nanoHUB, myExperiment, OpenWetWare, BioModels,
• Communication and Publishing – LinkedIn, YouTube, SlideShare,
Twitter, Wikipedia, GoogleDocs, DropBox, GitHub, Wikis and blogs galore!
• Research management – VivoWeb
• Bibliographic tools – Mendeley, ReadCube, CiteULike,
Google Citations
• Review and conference Management
– EasyChair
• Specific social networks – ResearchGate, BioMedExpert,
SciLink, OurSpace, BioCrowd
• LIMS and Lab Management – YourLabData, LabGuri
• Workflow management – Taverna, Triana, Kepler, Pegasus
• Research Cloud Services – e.g. Digital Science
DataONE Software
Catalogue lists about
300 tools
http://www.digital-science.com/
Disintermediation
E.g. Amazon Cloud
Embrace: oh good, how do we help
researchers use it
Reject: we had better build our own cloud so
we remain intermediaries
http://blogs.nature.com/eresearch/2011/06/03/antidisintermediationarianism-and-the-
cloud
Researcher’s e-infra dimensions,
obligations & opportunities
• Personal, Group
• Project, Consortium
• Community
• Institutional
• National, International
• Cloud Commodity
• Open source
Assemble our own e-infrastructure
Flexibility Cherished above all
things for
Institutional e-Infrastructure.
One Size Does Not Fit All.
What the researchers want that
actually fits how they work.
Where I Stopped
Due to lots and lots of discussion
Reflections on Research
and its impact on e-Infrastructure
Pan-Institutional, International Teams
Collective Intelligence • Experimental scientists, Theoretical
scientists, Scientific informaticians, Computational Scientists, Modellers, Specialist Tool developers, Service & resource providers, Infrastructure developers, System Administrators…
• Groups/Centres of different sizes
• “Long Tail” Postgrads/Citizens
• Planned/Emergent, Evolving
• Collective intelligence and crowd-
sourcing
• Community datasets, services
• Curation and collective campaigns
Make
Prediction
Log
Identify
Problem
Generate
Hypothesis
Investigate
Prior
Knowledge
Make
Observations
Test
Hypothesis
Perform
Experiments
Organise
Data Analyse
Data
Devise
New
Experiments
Draw
Conclusions
Communicate
Validate &
Compare
Results
Validate &
Compare
Results
Pool
Results
Agile
Science
with Data
Research is
often not
beautifully
planned.
Data • Exascale data is being handled: SKA, LHC…
• Biggish local data (e.g. Next Gen Seq)
• Small local data (e.g. spreadsheets & wikis)
• Scruffy data – text, semi-structured, emergent…
• Online data sets - public or consortium
• Moving lots of data around. Securely.
• Network traffic. Local compute & caching. Licensing. Standard formats.
• Bottlenecks: integration & analysis, community specific curation & stewardship, preservation if needed
• Open Data
• Data Journals
http://www.elixir-europe.org
Analytics Platforms
• Research as a Service
• Virtualisation & Automation
• Software is e-Infrastructure
– EPSRC Software Strategy report
– Software Sustainability Institute UK
– Software Carpentry
– Improving software practices
• The. Cloud.
– Community and UK Cloud
– Putting datasets / services on The Cloud
Open innovation
platforms/architectures
• Open data, Open APIs, Open
Licensing, Open Source, Open
Standards
• Enabling: others innovate
• Researcher-centric
• Adoption ramps.
DropandCompute
Open (maybe even reproducible research)
• Open Data, Open Access, Open Source Software – Elsevier climb down, Royal Society review
– RCUK Public consultation, Research Council Mandates
– Open Access – who pays?
– Scientists publish. They do not generally share outside trusted collaborations.
• Transparency and provenance – Possible re-computation on VM farms
• Stewardship – Stewardship mandates and policies vs Practical
Stewardship vs Community Stewardship
– Curation burden for reusability
Trend: New Publishing
• All scientific commodities – Software, models, data, methods,
know-how, articles, algorithms, services
• Integrated publishing – Data+discussion+method
– Data journals, Software Journals…
• Credit! Software, Data, People – People: Orcid, OpenID
– Software and Data journals
– Software citation? Data Cite
– Data and Software management policies and practices. Altmetrics
People
• Income generators: academics and students – their productivity should be highest priority IMHO
• Capability and Relationships – Training for young and mid-career
researchers/research technologists.
– Enable mixed skilled research teams to include research technologists and IT staff
– Bind ITS with research groups
– Comms channels – dog food
• Value and reward highly skilled research technologists within HE institutions with a career structure.
Google Tech Stop
How can RUGIT Help?
• Researchers’ needs – Top down (alone) is just not going to wash
– Partnering with researchers
– Lots of exaflop machines won’t help most of us….
• Use, leverage and share what is already used – Infrastructure already in your institutes used by your
researchers. Or even developed by your students and researchers
– Researchers & students are using many external services. And will continue to do so.
– Community and commodity solutions.
• Getting out the way…. – Enable don’t hamper, esp. domain specific e-infrastructure.
How can RUGIT Help?
• Joining up
– Greater provision by institutions
– Sustainability through collectivism
– Inventing is costly
• There is no one size fits all
– Avoid reinventing wheels but there are many kinds of
wheel - data and metadata.
• Rapid responsiveness
– Research opportunities often have narrow time
windows. Perfect solution too late is no good. Good
enough solution in time is perfect.
How can RUGIT Help?
• Be Open, Be Permeable. Be Flexible – Presume that you will not be the developers of much
of the “upper” infrastructure and the applications your users need.
– But you will need to be flexible enough to adapt into them.
– Future proof through openness.
• Research, Teaching, Life bleed together. – Project and research management. Joined up grant
submitting, paper writing, financial management, HR
– JeS / ROS / Local systems mismatch is example