services for sensitive research data gard thomassen, phd head of research support services group...

27
Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University Center for Information Technology (USIT) University of Oslo

Upload: joshua-lee

Post on 11-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Services for Sensitive Research Data

Gard Thomassen, PhD

Head of Research Support Services Group

Leader of the ”Services for Sensitive Data” project

University Center for Information Technology (USIT)

University of Oslo

Page 2: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Outline

• What is sensitive data?• Who has sensitive data?• Project background• Collaborators and reference group• System requirements• System outline• Technical and security details• Maintenance• Advantages and current status• International collaborations

Gard Thomassen,TSD 2.0

Page 3: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Who has sensitive data?

• Faculty of Medicine / Oslo University Hospital• Faculty of Theology • Faculty of Educational Sciences• Faculty of Social sciences• And so the list continues…also outside UiO..

Gard Thomassen,TSD 2.0

Page 4: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Project background

• UiO has an open network structure, but still with a high level of security

• Most of the UiO data is open • Various UiO/OUS researchers approached

USIT asking for an eInfrastructure for sensitive data (majority was MR-images and NGS data)

• The pilot project TSD 1.0 was run

Gard Thomassen,TSD 2.0

Page 5: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Lessons learned

• The need for our services far exceeded the scalability of our system

• Too much hands-on maintaining and manual setup of new projects and new users

• There is a need for a High Performance Computing (HPC) resource within a secure environment

• Not very user friendly (both ends)

Gard Thomassen,TSD 2.0

Page 6: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Main collaborators on TSD 2.0

Collaborators• Norwegian Storage Infrastructure (NorStore)• Norwegian Genetics Analysis Platform (GenAp)• Norwegian Dietary Registry (Faculty of Medicine)• Institute of Psychology (Faculty of Social Sciences)• Norwegian Cancer Sequencing Consortium (NCGC)

Reference group

Oslo University Hospital, NorStore, Regional Etichal Committee, National Institute of Public Health, Norwegian Cancer Registry, Research Network at OUS, Elixir Norway, NCGC, GenAP and Institute of Psychology,UiO.

6

Gard Thomassen,TSD 2.0

Page 7: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

System requirements• Security, isolation and access control as given by law• Large storage capacity• Multiple users• High performance computing resource• High bandwidth• Easy to maintain• Easy to use (including audio and video)• Some freedom within user space• Accessible from anywhere through authentication• A variety of software and public DBs must be available• Windows and Linux support (OS X if possible)• Data collection service• Data sharing service• National scope (so far..)

7

Gard Thomassen,TSD 2.0

Page 8: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Solution outline

8

Gard Thomassen,TSD 2.0

Page 9: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

System outline

9

Gateway

HPC - ColossusVM-server

Storage

Internet

Secure encrypted network to special high volume data production sites

1 (project)

1 (storage area)

n 1

Gard Thomassen,TSD 2.0

Page 10: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Using TSD 2.0 for analysis

10

VM B1 P1

P1

TSD disk

VM B2 P1

GWUser B1 P1

Colossus disk

Colossus

Front endColossus

Gard Thomassen,TSD 2.0

User B2 P1

TSD 2.0P1 DB

Page 11: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Data import and export using TSD 2.0

11

“Sluice-server”

Virtual “sluice- server”

Virtual project-server

“Sluice HD”

Project HD

TSD 2.0

NFS mount

2

Data copied here by ssh+scp or web-drive(2-factor authentication) encrypted data if sensitive

1 4

3

Gard Thomassen,TSD 2.0

Page 12: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Data collection using TSD 2.0

12

“Nettskjema”

Gard Thomassen,TSD 2.0

minID

Project VM

Project disk

Import mechanism

Encrypted XML (PGP)

TSD 2.0

Page 13: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Data-import for NGS-centers and other large scale data producers

13

Gard Thomassen,TSD 2.0

TSD 2.0

TSD controlled box on-site

HiSEQ

/tmp/storage

Project VM

Project disk

GW

Encrypted connection

Page 14: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Closed network at USIT

Technical outline

14

Admin services- Provisioning system- AD- Surveillance- Software repo- Cfengine- Vcenter- Backup- Antivirus- Log service

Storage / DBs- PostgreSQL- Archiving- Compartmentalized

disk

HPC-resource

Management- Mgmt of storage- Mgmt of network- Mgmt of hardware- Mgmt of VMs

Clients (2-factor login)

- Remote desktop clients- Thin-clients on dedicated

network- Special network for large-scale

data production centers

Publicly available network segment through “minID”

Web-questionary

Web portal Electronic consent

Clinical health dataprojects

Other sensitive dataprojects

Access network- National Health

network- Terminal servers- Thin client

servers- VPN

Gard Thomassen,TSD 2.0

Page 15: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Technical details

• KVM for virtualization (RedHat Linux)• Cerebrum as provisioning (a USIT application) • AD system administration guided by the provisioning

system (duplicated)• FreeBSD firewall and gateway (duplicated)• Integration with IDporten (Norwegian governmental

eID system) for www-enquiries and applications• Storage with separation between projects (Hitachi

disc system and encrypted backup to tape)• IPv6 on the inside (… and private IPv4)

15

Gard Thomassen,TSD 2.0

Page 16: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

HPC resource – Colossus

• At present about 500 cores • No project users are to log in on any nodes• One global job daemon to control data

integrity (to ensure project data separation)• /tmp/ and /work/ will be per projects and

cleaned after job finishes• As similar to Abel as possible• Separate disk and more nodes will come

soon

16

Gard Thomassen,TSD 2.0

Page 17: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Security details

• OATH TOTP 2-factor authentication – Smart phones or programmable hardware tokens

• Special roles for those allowed to export data• Import/export is under strict control• No open connection to the internet• Strong separation between projects (VLAN)• Special security measures with remote desktops• Extremely hardened FreeBSD gateway and firewall • Encrypted backup, one key per project• Sys admins are single users (traceability)• Sys admins have to use same authentication process• Most hardware is physically separated from other UiO

hardware

17

Gard Thomassen,TSD 2.0

Page 18: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Maintenance

• Reuse as much as possible from the USIT eInfrastructure

• Virtualize as much as possible• Management/ surveillance data can be

pushed, but not pulled (Nagios, Collectd) • Surveillance based on existing systems• Sys admins have different access levels

18

Page 19: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Opportunities enabled by TSD 2.0

• NGS research on humans is possible• Large scale imaging studies possible• “HUNT-like” studies online for the respondents and the

scientists• Off-site analysis of sensitive data• Secure storage for verification of published research• Electronic consent• Possible work-area for making exams?• TSD to host all human NGS research data from

UIO/OUS??

Gard Thomassen,TSD 2.0

Page 20: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Nordic collaboration opportunities• Laws are fairly similar (Norway very strict)• Difficult to exchange data for research• One should learn from each others as these systems

demands very special IT-knowledge• System development and system-administration is

non-sensitive and may be shared• Building TSD addresses many novel security

questions in a University setting, to be learnt from• Large DBs of health data may enable very

interesting research in the future (NeGI)• NeIC has shown interest into TSD 2.0• TSD collaborate with CSC in Finland and with BILS /

Elixir Sweden. BBMRI are interested20

Gard Thomassen,TSD 2.0

Page 21: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Current status

• Pilot project data is transferred now now• System is being prepared and finished for setting up

new projects and go into production• Storage is up• Secure Nettskjema is up• Working on risk evaluation• Project registration when risk evaluation is finished• HPC-resource 4th quarter 2013• Video and sound will be the main target during

further work• System Whitepaper (v1.0) written

Page 22: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

People involved

• Dag-Erling Smørgrav • Petter Reinholtsen• Elisabeth Ytterdal• Tor Fuglerud• DBA (PostgreSQL team)• Cerebrum team• Morten Werner Forsbring• Espen Grøndahl• HPC – Colossus team• Gard Thomassen

22

Project group / developers

• IT-dir Lars Oftedal• Hans A. Eide• Märtha Felton

Administration / associated

Gard Thomassen,TSD 2.0

Page 23: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Cost per project

• First year establishment price (per project)• Regular yearly project fee• License cost (licensed software usage)• Storage cost for storage exceeding basic

allocation• Cost of DB administration (if DB needed)• Cost of CPU hours Colossus

23

Page 24: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Project administration in TSD 2.0 - technical

• Application through the National ID-portal + Nettskjema• The project is created in Cerebrum with role-categories• The project is connected to resources (VM + disc + VLAN + DB

+ HPC)• Users are created and given their roles• Username, pwd and one-time-passwords are distributed• Accounts kept on storage, HPC CPU time and additional VMs

to enable control and book-keeping • NorStore may offer “free” storage within TSD (there might be a

small security mgmt overhead cost)• In the the future there will be some level of self service through

a web portal within TSD

24

Gard Thomassen,TSD 2.0

Page 25: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Conclusion

• It is very hard to make something secure and user-friendly at the same time– Researchers wants the freedom of using the internet while

doing research on sensitive data…

• A thorough risk assessment must be made during and after the planning and implementation phase to make the best choices

• What you can not avoid should at least be detected by some surveillance mechanism.

• More (inter)national / local cooperation wanted

25

Gard Thomassen,TSD 2.0

Page 26: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

Pilot project (TSD 1.0)

• Secure storage for large amounts of NGS data and MR-images (>100TB)

• Secure windows “research server” enabling usage of MS Office, STATA, SPSS etc on sensitive data

• Research server is based on an isolated system using VMware ESX

• Two-factor login-system • Encrypted backup

Gard Thomassen,TSD 2.0

Page 27: Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” project University

“The Ultimate Goal is….

….to be able to provide the same services that are available for researchers working with non-sensitive data, with the necessary security, with minimum impact on the user experience, and minimum extra overhead and cost.”

Hans Eide, 2012 (my boss)

27

Gard Thomassen,TSD 2.0