e-infrastructure landscape: a researcher’s viewpoint...stores. post-project archiving &...

43
e-Infrastructure landscape: A Researcher’s Viewpoint Professor Carole Goble FREng, FBCS CITP University of Manchester, UK Open Middleware Infrastructure Institute Software Sustainability Institute UK RUGIT (Russell Universities Group IT Directors, Imperial College, London 28 March 2012

Upload: others

Post on 08-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

e-Infrastructure landscape:

A Researcher’s Viewpoint

Professor Carole Goble FREng, FBCS CITP

University of Manchester, UK

Open Middleware Infrastructure Institute

Software Sustainability Institute UK

RUGIT (Russell Universities Group IT Directors,

Imperial College, London 28 March 2012

Why me? • University-based e-Infrastructure researcher, developer

and service provider: workflow management, catalogues, data/model/workflow repositories, VREs, catalogues, data collection, ontologies, publishing

• Mixed team of CS researchers, scientific informaticians, software engineers and IT Services: 1 seconded from ITS, 2 working with us.

• e-Infrastructure provider to large-scale projects: biodiversity, astrophysics, astronomy, chemistry, genomics, digital document preservation, systems biology, social science …

• e-Infrastructure provider to long tail researchers: biology, genomics, chemistry, social science, archaeology, music, …

Why me?

• Open Middleware Infrastructure Institute UK and Software Sustainability Institute, ESNW e-Science Centre with ITS@Manchester.

• Chaired expert panel for BIS “Delivering the UK’s e-Infrastructure for research and innovation”, 2010

• Leading a Think Tank Tea Club for academic-lead consultation on a strategy for e-Infrastructure for Research

• Chair UK User Forum for ELIXIR ESFRI

What is e-Infrastructure? The integration of digitally-based technology, resources,

facilities, and services combined with people and

organizational structures needed to support modern,

collaborative research (and teaching).

1. Data and Storage

2. Software (and Algorithms)

3. Hardware (Compute)

4. Networks

5. Security and authentication (BIS Report)

6. People (Collaboration, Skills, Capacity)

7. The Digital Library

e-Infrastructure Tiers Researcher and Community Specific Applications

Visualisation tools, Simulation tools, Research Tools etc

Stuff we want to be there and buy into

Compute, Storage, Networks, Backup, Service Hosting

Library Services, File and Don’t Forget Data Store,

Institutional Repository, Security Services

Core

Base

Specific

Com

munity

Specific

Com

modity, m

aybe C

usto

mis

ed

Pan-Application and Possible Pan

Community Specific Infrastructure

Often community specific.

Maybe even developed by your researchers

R and Matlab, Specialist Data Management,

Workflow Management, CDK Toolkit for Chemistry

LIMS, ELN, Catalogues and Repositories

Fundamental Expectations

of Institutional IT Support • Lots and lots of storage.

• Backup and fast recovery.

• Fast and reliable network.

• Wireless everywhere.

• Compute of different kinds that works with my tools and not too many hoops, if any.

• Reliable Service hosting.

• Support any device and any operating system.

• Availability of skilled and helpful staff when you need them. Preferably known.

• Ability to use my community’s/Labs e-infrastructure

Political Landscape

Landscape of UK

e-infrastructure Reviews

BIS, RCUK Delivering the UK’s e-

Infrastructure for research and

innovation, 2010

RCUK & RS Review of e-Science 2009

OSI Developing the UK’s e-Infrastructure for science and innovation, 2007

BIS, RCUK, HEFCE, DEL, SFC, Report of

the e-Infrastructure Advisory Group, 2011

BIS A Strategic Vision for UK e-Infrastructure, 2011

Strategy for the UK Research Computing Ecosystem, 2011

BIS Technology and Innovation Futures

Foresight Report, 2010

JISC Review 2010

Coordination

Capacity

Data

• Coordination across research councils.

• Changing behaviours to reward and enable reuse.

• Overcoming fragmentation.

http://www.rcuk.ac.uk/documents/research/esci/e-Infrastructurereviewreport.pdf

UK Research Computing

Ecosystem

Campus/ Regional

HPC e.g HPC-Wales

Special purpose

HPC e.g. DiRAC

National HPC

e.g. HECToR Public data

analysis

cloud

Thematic

petabyte

data store

Specialist

databases

Software

development

HPC

Enhance

current DTCs

Software

development

E-science

Open and

accessible

JANET

Cyber-

security

e-Infrastructure Leadership Council

• The additional £145M of capital investment in e-infrastructure recently announced by BIS builds on the initial outputs from the “A Strategic Vision for UK e-Infrastructure” report.

• Co-chaired by Dominic Tildesley and David Willett

• A 10 year roadmap for investment for networks, data and storage, compute, software and algorithms, people and skills and security and authentication.

• BIS dominated. e-Infrastructure for business

• HPC and Hartree oriented. Strong commercial interest. Concerns about representation and vested interests.

• Keen on specialist centres

• “Today to out-Compute is Out-Compete”

Rear-guard Actions

• RCUK e-Infrastructure subgroup

– Funding councils, championed by Doug Kell (BBSRC)

• e-Infrastructure academic user forum

– Academic lead, Chairs: Prof David De Roure (Oxford) and Prof Peter Coveney (UCL)

– Community thought leadership group

– Concerns about the constitution of the e-Infrastructure Leadership Council

e-Infrastructure

Academic User Forum • e-Infrastructure academic user forum

– Concerns about the constitution of the e-Infrastructure Leadership Council

– Builds on earlier e-Science Forum and HPC-SIG

– A whole-community exercise

• new Digital Research conference – St Catherine's College in Oxford 10-12th September.

– Showcasing successful digital research, tools and methods, and building community especially around big data, open data, open science and the next generation of digital researchers.

RCUK consultation for a Capital

Investment Roadmap • E-infrastructure investment

– fast turn round on university clusters – opportunistic in

year spend from other parts of government

– Speed which the community mobilised seen very

favourably.

– RUGIT now needs to articulate future capital needs

through the RCUK consultation on capital for RCUK

and HEI to be ready if some more funding is found.

– Is RUGIT lined up with your HEIs?

– http://www.rcuk.ac.uk/research/Infrastructure/capcons

t/Pages/home.aspx

RUGIT Help?

• Participate in the Academic User Forum

• Lobby e-Leadership Council

• Remember its not all flippin’ HPC and

RUG Institutions are where much of the

research is done.

• Respond to the CIR

• Be prepared to respond to funding

councils

EPSRC Software Strategy

• Software as an Infrastructure – Survey, response, action plan

– http://www.epsrc.ac.uk/SiteCollectionDocuments/other/SoftwareAsAnInfrastructure.pdf

• Areas – Identification of new areas and grand challenges

– Enabling and promoting collaboration

– Research and Development

– Training

– Career Path Support

– Joint funding models

– Supporting Innovation

– User Support

– Quality of Code

– Sustainability of Code

European e-Infrastructure The mood is reuse and cooperation.

• e-IRG e-Infrastructure Reflection Group http://www.e-irg.eu

• Siena Standards and Interoperability for eInfrastructure and Implementation Initiative http://www.sienainitiative.eu

• Horizon 2020 http://ec.europa.eu/research/horizon2020

• Still keen on Grid and HPC though have finally embraced Cloud

• GEANT: very successful network-level e-infrastructure.

• EGI: now well established European Grid e-infrastructure – EGI faces challenges due to lack of standardisation, a focus on a small

number of application domains and financing.

– Grid stuff is messy: National infrastructures, numerous middleware types, disconnect between EMI and EGI.

– Recent spin out for data – e.g. EUDAT (eudat,eu) though a bit archive focused

• PRACE: HPC infrastructure for Europe – very successful community but struggling to transition to a cash

contribution model from a resource contribution model.

European e-Infrastructure

• Usual obsession with Scale

– Exascale Challenge

• CRESTA, DEEP, Mont-Blanc

• combined funding of 25 million

• different aspects of the exascale challenge “using

a co-design model spanning hardware,

systemware and software applications”.

• Centres of Excellence

– European Software Centres of Excellence

Community focused

e-Infrastructure • Own services, catalogues, data management

environments, datasets, tools and metadata standards, tools, policies, licensing conventions….

• Large number of EU community specific e-Infra: – BioVeL for BioDiversity

– SCAPE for Digital Libraries

– ScalaLife (Comp Bio), MAPPER, VPH…blah blah….

• Flagships, ESFRI projects, Innovative Medicine Initiatives, e.g. OpenPHACTS

• National e.g. UKDA

• International community: e.g. BioStar, Open Bioinformatics Federation, BioSharing, COMBINE, PubMed….

European Strategy Forum on

Research Infrastructures

ELIXIR, http://www.elixir-europe.org/

• Data Access, Curation and Integration – managing the data deluge

– integrating the data to reduce fragmentation of effort and research

– incorporating and exploiting new types of data

– maintaining open access to biological data in order to enhance competitiveness and innovation.

– radically enhancing Europe’s data infrastructure and make it more accessible.

– presenting users with a single, transparent interface to a world of resources

Hub and Spoke Model

European Bioinformatics Institute

1500 databases in

public use.

> 2000 web services

Ecology and BioDiversity • DataONE Investigators toolkit: Dryad data

repository, Workflow tools, Excel tools, R and

MatLab libraries and tools, Specialist Data

Management tools, Bibliography tools, remote

file management tools, data management

planning tools, software tools catalogue

• BioVeL infrastructure: workflows, data

management, web services, portals, catalogues,

repositories, Excel tools

• CAMERA 2.0: portal, workflows, services,

metadata management, data management… http://camera.calit2.net

http://www.dataone.org/investigator-toolkit

http://www.biovel.org

Find, exchange and

interlink, preserve,

publish data, models,

publications, SOPs&

analyses.

User access control.

Mix local and central

stores. Post-project

archiving & publication.

Gateway to

public tools

and resources

Standards

compliant.

Launch and validate

models and analyses:

JWS Online

Find experts,

colleagues

and peers

Group, Project, Consortium,

Community Sharing

Personal Storage

http://www.sysmo-db.org

Faciliting/leveraging Institutional

e-Infrastructure and know-how

• Many research teams have e-Infrastructure and technically skilled people. How do you leverage, enable and exploit them? And vice versa?

Commercial or Open

Cloud-type e-Infrastructure • PAYG Compute

– Amazon Cloud and Azure

• Cluster computing – Condor

• Software repositories – Github, BitBucket

• Data Management – Dryad, DataVerse,

• Data Commons – FigShare

• Specialist repositories & Catalogues

– nanoHUB, myExperiment, OpenWetWare, BioModels,

• Communication and Publishing – LinkedIn, YouTube, SlideShare,

Twitter, Wikipedia, GoogleDocs, DropBox, GitHub, Wikis and blogs galore!

• Research management – VivoWeb

• Bibliographic tools – Mendeley, ReadCube, CiteULike,

Google Citations

• Review and conference Management

– EasyChair

• Specific social networks – ResearchGate, BioMedExpert,

SciLink, OurSpace, BioCrowd

• LIMS and Lab Management – YourLabData, LabGuri

• Workflow management – Taverna, Triana, Kepler, Pegasus

• Research Cloud Services – e.g. Digital Science

DataONE Software

Catalogue lists about

300 tools

http://www.digital-science.com/

Researcher’s e-infra dimensions,

obligations & opportunities

• Personal, Group

• Project, Consortium

• Community

• Institutional

• National, International

• Cloud Commodity

• Open source

Assemble our own e-infrastructure

Flexibility Cherished above all

things for

Institutional e-Infrastructure.

One Size Does Not Fit All.

What the researchers want that

actually fits how they work.

Where I Stopped

Due to lots and lots of discussion

Reflections on Research

and its impact on e-Infrastructure

Pan-Institutional, International Teams

Collective Intelligence • Experimental scientists, Theoretical

scientists, Scientific informaticians, Computational Scientists, Modellers, Specialist Tool developers, Service & resource providers, Infrastructure developers, System Administrators…

• Groups/Centres of different sizes

• “Long Tail” Postgrads/Citizens

• Planned/Emergent, Evolving

• Collective intelligence and crowd-

sourcing

• Community datasets, services

• Curation and collective campaigns

Make

Prediction

Log

Identify

Problem

Generate

Hypothesis

Investigate

Prior

Knowledge

Make

Observations

Test

Hypothesis

Perform

Experiments

Organise

Data Analyse

Data

Devise

New

Experiments

Draw

Conclusions

Communicate

Validate &

Compare

Results

Validate &

Compare

Results

Pool

Results

Agile

Science

with Data

Research is

often not

beautifully

planned.

Data • Exascale data is being handled: SKA, LHC…

• Biggish local data (e.g. Next Gen Seq)

• Small local data (e.g. spreadsheets & wikis)

• Scruffy data – text, semi-structured, emergent…

• Online data sets - public or consortium

• Moving lots of data around. Securely.

• Network traffic. Local compute & caching. Licensing. Standard formats.

• Bottlenecks: integration & analysis, community specific curation & stewardship, preservation if needed

• Open Data

• Data Journals

http://www.elixir-europe.org

Analytics Platforms

• Research as a Service

• Virtualisation & Automation

• Software is e-Infrastructure

– EPSRC Software Strategy report

– Software Sustainability Institute UK

– Software Carpentry

– Improving software practices

• The. Cloud.

– Community and UK Cloud

– Putting datasets / services on The Cloud

Open innovation

platforms/architectures

• Open data, Open APIs, Open

Licensing, Open Source, Open

Standards

• Enabling: others innovate

• Researcher-centric

• Adoption ramps.

DropandCompute

Open (maybe even reproducible research)

• Open Data, Open Access, Open Source Software – Elsevier climb down, Royal Society review

– RCUK Public consultation, Research Council Mandates

– Open Access – who pays?

– Scientists publish. They do not generally share outside trusted collaborations.

• Transparency and provenance – Possible re-computation on VM farms

• Stewardship – Stewardship mandates and policies vs Practical

Stewardship vs Community Stewardship

– Curation burden for reusability

Trend: New Publishing

• All scientific commodities – Software, models, data, methods,

know-how, articles, algorithms, services

• Integrated publishing – Data+discussion+method

– Data journals, Software Journals…

• Credit! Software, Data, People – People: Orcid, OpenID

– Software and Data journals

– Software citation? Data Cite

– Data and Software management policies and practices. Altmetrics

People

• Income generators: academics and students – their productivity should be highest priority IMHO

• Capability and Relationships – Training for young and mid-career

researchers/research technologists.

– Enable mixed skilled research teams to include research technologists and IT staff

– Bind ITS with research groups

– Comms channels – dog food

• Value and reward highly skilled research technologists within HE institutions with a career structure.

Google Tech Stop

How can RUGIT Help?

• Researchers’ needs – Top down (alone) is just not going to wash

– Partnering with researchers

– Lots of exaflop machines won’t help most of us….

• Use, leverage and share what is already used – Infrastructure already in your institutes used by your

researchers. Or even developed by your students and researchers

– Researchers & students are using many external services. And will continue to do so.

– Community and commodity solutions.

• Getting out the way…. – Enable don’t hamper, esp. domain specific e-infrastructure.

How can RUGIT Help?

• Joining up

– Greater provision by institutions

– Sustainability through collectivism

– Inventing is costly

• There is no one size fits all

– Avoid reinventing wheels but there are many kinds of

wheel - data and metadata.

• Rapid responsiveness

– Research opportunities often have narrow time

windows. Perfect solution too late is no good. Good

enough solution in time is perfect.

How can RUGIT Help?

• Be Open, Be Permeable. Be Flexible – Presume that you will not be the developers of much

of the “upper” infrastructure and the applications your users need.

– But you will need to be flexible enough to adapt into them.

– Future proof through openness.

• Research, Teaching, Life bleed together. – Project and research management. Joined up grant

submitting, paper writing, financial management, HR

– JeS / ROS / Local systems mismatch is example