services for sensitive data and ebiobanks at university of oslo · 2015-05-06 · services for...

24
Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research Support Services Group Leader of the TSD project University Center for Information Technology (USIT) University of Oslo

Upload: others

Post on 27-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research Support Services Group Leader of the TSD project University Center for Information Technology (USIT) University of Oslo

Page 2: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Outline •  TSD promo J •  What is sensitive data •  Laws and regulations •  TSD overview •  TSD nice-to-know •  TSD services •  TSD opportunities and •  Risk

Gard Thomassen,TSD 2.0

Page 3: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Computerworld 16/5-14

Page 4: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Norsk KreftGenom Konsortium Sammenliknet med den hardware vi benyttet fram til overgangen til TSD, som vel kan karakteriseres som en middels brukbar tjenermaskin, med 64 kjerner, kan vi med TSD oppnå en teoretisk hastighetsforbedring på 30X. I tillegg til dette kommer at vi har opitmalisert vår analysepipeline, ved at vi har parallellisert flere trinn. Tidligere ville en sekvenseringsanalyse på 48 svulst/normal-par resultert i kjøringstid på to-tre måneder minimun. Vi kjørte nå denne uka på TSD det samme på to dager og noen timer. Altså forsiktig sagt en dramatisk forbedring. Prof Eivind Hovig, NCGC

Page 5: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Teknisk ukeblad & e24, 5/5-14

Page 6: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Uniforum

Page 7: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

What is sensitive data?

•  Personal Data Act §2, point 8 –  race/ethnic data, political opinion, philosophical

and religious beliefs, the fact that a person has been suspected of, charged with, indicted for or convicted a criminal act, health, sex life and trade-union membership

•  Biotechnology Act •  Health Registry Act •  And so on..

Gard Thomassen,TSD 2.0

Page 8: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

System requirements •  Security, isolation and access control as given by law •  Large storage capacity •  Multi tenant (multiple users) •  High performance computing (HPC) resource •  High bandwidth •  Easy to maintain and operate •  Easy to use and “practical” (also for audio and video) •  Some freedom within confined user space •  Accessible from anywhere through proper mechanisms •  A variety of software and public data-sources must be available •  Windows and Linux support (server/host-side) •  Data collection services •  Data sharing services

Gard Thomassen,TSD 2.0

Page 9: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Tough requirements, tough project

Services for Sensitive Data – TSD (Norwegian: Tjenester for Sensitive Data)

Started initial work with a pilot in 2009 Full fledged services in production spring 2014

Gard Thomassen,TSD 2.0

Page 10: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

System outline

Gateway

HPC - Colossus VM-server

Storage

Internet

Secure encrypted network to special high volume data production sites

1 (project)

1 (storage area)

n 1

Gard Thomassen,TSD 2.0

Page 11: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Using TSD

VM U1 S1

S1

TSD disk

VM U2 S1

GW User1 Study1

Colossus disk

Colossus

Front end Colossus

Gard Thomassen,TSD 2.0

User2 Study1

TSD S1 DB

Page 12: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Data import and export using TSD

“Sluice-server”

Virtual “sluice- server”

Virtual project-server

“Sluice HD”

Project HD

TSD

NFS mount

2

Data copied here by ssh+scp or web-drive (2-factor authentication) encrypted data if sensitive

1 4

3

Gard Thomassen,TSD 2.0

Page 13: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Data collection using TSD

“Nettskjema-minID” Nettskjem hjemmeside

Gard Thomassen,TSD 2.0

minID

Project VM

Project disk

Import mechanism

Encrypted XML (PGP)

TSD

Page 14: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

What TSD offers at present

•  Secure storage •  Secure data analysis •  Linux or windows hosts •  Secure import and export •  Web-based data harvesting •  HPC cluster •  Postgres DBs

Page 15: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

HPC resource – Colossus •  At present about 1500 cores (~30 TFLOPs) •  No project users are to log in on any nodes •  One global job daemon to control data integrity

(to ensure project data separation) •  $SCRATCH will be on a per project basis and

cleaned after each job finishes •  As similar to Abel (the non-sensitive HPC

resource in Oslo) as possible •  Separate disk system for parallel file-system •  Huge-mem nodes and Infiniband interconnect

16

Gard Thomassen,TSD 2.0

Page 16: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Practical things to remember

•  How to get onboard •  Login •  Where is my data •  What is backed up •  What needs to be encrypted •  Where can I access TSD from •  How to get HPC access •  What does it cost •  How to use Nettskjema •  Where do I send my questions :

–  [email protected] –  [email protected]

Page 17: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Technical details •  KVM for virtualization (RedHat Linux) •  Cerebrum as provisioning (a USIT application) •  AD system administration guided by the provisioning

system (duplicated) •  FreeBSD firewall and gateway (duplicated) •  Integration with IDporten (Norwegian governmental

eID system) for www-enquiries and applications •  Storage with separation between projects (Hitachi

disc system and encrypted backup to tape) •  IPv6 on the inside (… and private IPv4) •  Free Radius for 2-factor auth •  Separate console server (physical)

18

Gard Thomassen,TSD 2.0

Page 18: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Security details

•  OATH TOTP 2-factor authentication –  Smart phones or programmable hardware tokens

•  Import/export is under strict control •  No open connection to the internet •  Strong separation between projects (VLAN) •  Hardened FreeBSD gateway and firewall •  Encrypted backup, one key per project •  Sys-admins are single users (traceability) •  Sys-admins have to use same authentication process •  Hardware is physically separated from other UiO hardware

Gard Thomassen,TSD 2.0

Page 19: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Future of TSD - main topics •  How to handle video and sound

–  harvesting –  management –  metadata –  analysis

•  Journal system for Psychologists (Univ of Umeå collaboration) •  Biobanks •  VMware and VDI infrastructure (BLAST or Thinlinc for Linux, PCoIP for

windows) •  Galaxy inside TSD in full scale •  Elixir helpdesk connected to TSD •  Running Docker containers •  Hosting of user-defined VMs -> no! at least not now

Page 20: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Risk-analysis

•  System har been discussed with Datatilsynet – no major worries

•  Risk analysis has been performed by USIT and no serious issues detected as of February 2015.

•  OUS and AHUS and VVHF and several orthers are on board as users

•  We have a board of advisory for all changes •  Backup has been

Page 21: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Main collaborators on TSD

Collaborators •  Norwegian Storage Infrastructure (NorStore) •  Norwegian Genetics Analysis Platform (GenAp) •  Norwegian Dietary Registry (Medical Faculty) •  Institute of Psychology (Faculty of Social Sciences) •  Norwegian Cancer Sequencing Consortium (NCGC) Reference group Oslo University Hospital, NorStore, Regional Ethical Committee, National Institute of Public Health, Norwegian Cancer Registry, Research Network at OUS, Elixir Norway, NCGC, GenAP, Institute of Psychology,

Gard Thomassen,TSD 2.0

Page 22: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Capabilities enabled by TSD

•  Large scale NGS research on human genomes •  Large scale medical imaging studies •  Large scale population studies with web-based

data collection •  Off-site analysis of sensitive data •  Secure storage for verification of published

research •  eBiobank hosting •  Electronic consent

Gard Thomassen,TSD 2.0

Page 23: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

Nordic collaboration opportunities •  Laws are fairly similar (Norway very strict) •  Difficult to exchange sensitive data for research •  One should learn from each other as these systems

demands very special IT-knowledge •  Services development and system-administration

know-how is non-sensitive and may be shared •  Building TSD addressed many novel security

questions in a University setting to be learnt from •  Large DBs/registeries of health data may enable very

interesting research in the future •  TSD is involved in the NeIC-based Tryggve project •  We are happy to collaborate!

Gard Thomassen,TSD 2.0

Page 24: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research

People involved

•  tsd-core@usit •  virt-core@usit •  storage-core@usit •  postgres-core@usit •  network-core@usit •  hpc-core@usit •  windows-core@usit •  unix-core@usit •  IT-security@usit

Project group / developers •  IT-dir Lars Oftedal •  Hans A. Eide •  Märtha Felton

Administration / associated

Gard Thomassen,TSD 2.0