managing research data at bristol

15
Advanced Computing Research Centre http://www.bris.ac.uk/acrc Managing research data at Bristol 25 February 2013 Dr Ian Stewart, Director of Advanced Computing Simon Price, Assistant Director (R&D/ILRT)

Upload: simon-price

Post on 21-Mar-2017

13 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Managing research data at Bristol

Advanced Computing Research Centre

http://www.bris.ac.uk/acrc

Managing research data at Bristol

25 February 2013

Dr Ian Stewart, Director of Advanced Computing

Simon Price, Assistant Director (R&D/ILRT)

Page 2: Managing research data at Bristol

BlueCrystal Facility

• Phase 1: 384 AMD Opteron cores, 15TBytes storage (GPFS);

• 2TFlops

• Phase 2: 3500 Intel Harpertown cores, GPGPU accelerators, 100TB storage attached + 300TB near line (GPFS);

• 40TFlops

• Phase 3: 5400+ Intel SandyBridge cores, GPGPU accelerators, 400+TB storage attached (Panasas);

• 200TFlops

• < 100 users Phase1 to > 600 users now

• More compute => much more data

2

Page 3: Managing research data at Bristol

UoB Research Data• 2010 UoB survey indicated storage requirements of 1PB and growing

– large data holdings spread throughout University – not just HPC– Arts & Humanities, Social Science, Social Medicine,…

• Projected requirements ~16PB by 2018 (Moore’s law)– But growing evidence that assets are increasing at faster rate

• Research grants 3 years– storage of data > 7 or 10 years minimum– EPSRC policy – implies “forever”– drug design: 30 years; airframe: 50 - 100 years

• Q: Who pays?– HEIs? RC? By access?

• Q: Can we afford to keep it all? Should we keep it all?• Not just storage - also need to manage our data assets

3

Page 4: Managing research data at Bristol

4 Research Data Storage Facility (RDSF)

Page 5: Managing research data at Bristol

HPC & RDSF Management

55

BoardPro Vice-Chancellors (Research, Learning & Teaching), Deans of Science & Engineering, Deputy

Finance Director, Director ITS, Chair and permanent members of HPC Exec

HPC Executive2 permanent members (Director ACRC and senior CS/HPC academic), 5 rotating members from User Group, representative of Estates + non-exec member, Research Facilitator

User GroupsHPC StakeholdersStorage users

Technical Advisory Board

HPC SysAdminsNominated HPC stakeholdersPermanent members of ExecExternal HPC experts

ACRCDirector2 x HPC SysAdmins1 .5 x Storage SysAdminResearch Facilitator

Research Data Storage and Management Board

Directors ITS and ACRC, Research Facilitator, representatives from HPC Exec, 6 Faculties, Library and RED

Page 6: Managing research data at Bristol

RDSF Design

• BluePeta – Research Data Storage Facility

• Project thinking began in 2007/8

• Resilient, expandable, secure, enterprise-grade

• Three machine rooms– Two disk mirrors (DDN9900) in separate purpose built HPC rooms– Tape backup (IBM TS4500) in UoB corporate machine room– Tape archive in planning (offsite? security? access?)– GPFS/TSM? (considering LTFS, Arkivum & Filetek)

• Cross Site SAN– Redundant routes– Three filesystems (single copy - 2 block sizes, mirrored)

• Desktop and Departmental exports– CIFS, NFS, GPFS

6

Page 7: Managing research data at Bristol

RDSF Policy documents

• Involved IT Services, RED, Academics, Data Security, Secretary’s Office,

Finance, and others.

• Policy for the use of the RDSF– Scope of RDSF– Responsibilities– Processes

• Terms of Use– Detailed companion to policy– Covers FOI, Ethical, Legal, Costs, Ownership– Technical Issues

• FAQ

• External User Agreement– Incorporates all the above in a contract

7

Page 8: Managing research data at Bristol

Using the RDSF

• Data Steward (usually PI) applies - has responsibility for the data; can

then register one or more projects.

• On-line application form asks all relevant questions upfront:– Provide DMP if available (ties in with funder policy.)– Personal data? FOI exemption? Security level?

• Academic review for new applications.

• 800TB allocated across all faculties in 20 months.

• Usage currently around 30% of allocation.

• Storage policy – subject to on-going revision.

• Annual asset holding review planned from 2013.

8

Page 9: Managing research data at Bristol

Costs of using the RDSF

• Previous model - £400 per TB per annum on disk, but with funders

requiring long term retention of data (e.g. 10 years+), how do

researchers fund data storage after the end of the project?

• New Pay Once, Store Forever (POSF) model addresses this. Applies

Moore’s Law to storage costs through multiplier.

• Encouraging researchers to include costs in grant applications (line

being added to fEC tool).

• Q: How to cost long term data curation in POSF?

• Q: Alternatives?– Cloud? Expensive. Data access costs. How to mine cloud data?– HE sector facility/facilities? EPSRC regional consortia?

9

Page 10: Managing research data at Bristol

data.bris – research data service• Building on storage expertise.

• JISC funded pilot in Arts to March 2013.– Developing portal to make subsets of RDSF data accessible. – Researcher training and advice plus website.– Metadata guidance.– RDM principles.– Exploring ways of sharing data (e.g. Project Moonshot).

• Business case to develop a research data service.– “Minimal” service from April 2013 and then incrementally developing

a wider service by 2016.– Process will be led by Library with support from IT Services and

RED (research services).– Integration of Pure (RIS) and data.bris will be explored.– Curation cost recovery models to be considered.

10

Page 11: Managing research data at Bristol

data.bris – systems environment1111

Page 12: Managing research data at Bristol

data.bris – types of data access

• Research project space– Read/Write access by members of research project

• Mounted drive

• Research data publication– Read-only access to published data

• DOI + data discovery

• Research-active data sharing– Read-only access to unpublished data

• "Web Sharing"– Read/Write access to research-active data

• "Collaborative Sharing"

12

Page 13: Managing research data at Bristol

Web Sharing

• Create links to files in your project space that can then beforwarded to collaborators.– Suitable for sharing with collaborators who only need read access.– Low security: traffic is not encrypted, links are not secured.

13

Page 14: Managing research data at Bristol

Collaborative Sharing

• Create project file space that can be shared securely with your collaborators.– Supports both read-only and read/write access for collaborators via SFTP.– High security: traffic is encrypted, only accessible by registered

collaborators.– Supports resumption of interrupted file transfers.

14

Page 15: Managing research data at Bristol

Further Information15

ACRC – http://www.bris.ac.uk/acrc

data.bris – http://data.bris.ac.uk