bbmri-eric and aai - aarc project · project researcher generates ws ethical board metadata...
TRANSCRIPT
©BBMRI-ER
IC
BBMRI-ERIC and AAIAssoc. Prof. RNDr. Petr Holub, Ph.D.
IT & Data Protec on Manager @ BBMRI-ERIC,CIO of BBMRI-ERIC CS IT
CORBEL and AARC/AARC2 AAI Workshop,Paris, 2016–05–31
©BBMRI-ER
IC
What is BBMRI-ERIC
CItizen(Research Participant,
donor, patient)clinician
clinician
industry(pharma, ...)
biobank
RESEARCHER(academic,
industrial)
providessample/data/
/expertise/services
returnsdata
research resultscommercialization
clinicaltrials
treatment
drugs
healthcarepolicies
retrievessamples/data
retrievesdata(/samples)
eligibledata
providessamples/
data
cryo
healthcareregistries
public healthpolicy maker
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 2 / 21
©BBMRI-ER
IC
What is BBMRI-ERIC
▶ An infrastructure that provides/facilitates secure andprivacy-protec ng access to key resources in order tosupport biomedical research and to support healthcareadvancement:
biosamples from biobanks,related data: clinical, omics, phenotypes, etc.,exper se and other services (e.g., sample & datahos ng),biomolecular resources.
biobanks := samples + data + expertise + services;
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 3 / 21
©BBMRI-ER
IC
What is BBMRI-ERIC
▶ Hierarchical distributed architecture ≡“hub-and-spokes architecture”
⟹ federated IT architecture
▶ Subject to regulatory frameworks: privacy-protec on,health,…
e.g., upcoming General Data Protec on Regula on.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 4 / 21
©BBMRI-ER
IC
IT Architecture of BBMRI-ERIC
TRUST
PRIVACY
PARTICIPANT
broad/narrowinformedconsent
SAMPLES
DATA
COMPLIANT?
aCCESS
(MTA/DTa)
PROJECT
RESEARCHER
gENERATES
REVIEWS
ETHICALBOARD
METADATAREPOSITORY
PUBLISH
CUSTODIAN
COMPUTINGDATA
STORAGE
3RD PARTY
COMPUTING &STORAGE
AGGREGATEd or
aNONYMIZED data
INFORMATIONLOSS!
(PRIVATE CLOUDS)
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 5 / 21
©BBMRI-ER
IC
IT Architecture of BBMRI-ERIC
Underlying network/computing/storageinfrastrucure
Distributed/federated authentication
Networking - including VPNs and interfaces to the biobank/hospital systems
Logging & auditing
Privacy, pseudonymization, anonymizationtools
User Interfaces Machine readable interfaces
Databases with support for semantics and federations
DirectorySample Broker
Core computer infrastructureCloud infrastructures with support for private clouds &
moving computation to data
Sample Locator
Sensitive Data Processing Platform
Clinical records
extraction
Collaborative systems
…
Translation of ontologies
Reference Tools for Biobanks
Middleware (bothBBMRI-ERIC & external)
BBMRI-ERIC applications
Distributed/federated authorization
orange …BBMRI-ERIC own components; blue …components expected from other infrastructures.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 6 / 21
©BBMRI-ER
IC
BBMRI-ERIC CS IT
▶ Most of the IT services should be implemented viaBBMRI-ERIC Common Service IT
formal way to organize the member countriescontribu ng to the IT,official start November 1, 2015(effec ve January 1, 2016),ADOPT BBMRI-ERIC acts as booster toCS IT core budget,acts as coherent development ecosystem:
● consistent set of tools implemen ng the wholeworkflow of BBMRI-ERIC IT services,
● (running Scrum of scrums together :-) ).
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 7 / 21
©BBMRI-ER
IC
BBMRI-ERIC CS IT
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 8 / 21
©BBMRI-ER
IC
BBMRI-ERIC CS IT
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 9 / 21
©BBMRI-ER
IC
Status of AAI (1)
▶ What do we need from AAI:▶ Authen ca on:
iden ty verifica on (ve ng)● LoA 1–2 depending on service
authen ca on instances● LoA 1–3 depending on service
federated architecture▶ Authoriza on
matching of informed consent & project as a part ofini al authoriza on decisionperson+project iden ty for authoriza on decisions
▶ EGI-Engage M6.2: BBMRI-ERIC Security & PrivacyRequirements
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 10 / 21
©BBMRI-ER
IC
Status of AAI (2)▶ Summary of minimum requirements:
Table 8: Minimum requirements for basic data types. Non-personal data is used to denote data the does not containany traces of privacy-sensi ve data (e.g., data about opera on of the biobank storage systems).
raw (non-deinden fed)
pseudonynous prac callyanonymous
non-personal
Authen ca on and authoriza onIden ty verifica on LoA ≥ 2 LoA ≥ 2 LoA ≥ 0 openAuthen ca on instance LoA ≥ 3 LoA ≥ 2 LoA ≥ 0 openAssessing project & informed consentcompliance
not availablefor research
MANDATORY RECOMMENDED –
Restricted access high security high security medium-lowsecurity
open
DTA/MTA REQUIRED REQUIRED RECOMMENDED openAuthen ca on and authoriza on
Access log archive since last access ≥ 10 years ≥ 10 years ≥ 3 years –Data transfers and storage
Encrypted storage REQUIRED REQUIREDEncrypted transfers REQUIRED REQUIRED
Req-6 The BBMRI-ERIC policiesMUSTbe compa blewith GÉANTData Protec on Code of Conduct54 [91].
5.2 Requirements on Accountability and Archiving
Req-7 Accepta on of a DTA or a MTA MUST be stored in non-repudiable way by both par es of theagreement. The document MUST contain agreed star ng date and lifespan of the contract.
Possible implementa on is PDF documents signed electronically by both par es using visible sig-nature stamp, so that it can be also printed for archival purposes.
Req-8 Release of any samples or any data containing person-level informa on (i.e., including anonymousand pseudonymous data) MUST be stored in non-repudiable way by the biobank.
Req-9 LinkMUST bemaintained between the DTA/MTA and the samples and data sent to the reques ngparty.
Req-10 Access logs to any data that involves informa on on the level of individuals (e.g., sample-leveldata including prac cally anonymous data) MUST be kept for minimum of 3 years.
Note that this is aminimumwhichmaybe increased for specific cases, such as Requirement Req-11.
Req-11 Access logs to any non-deiden fied data or pseudonymized data MUST be kept at least for thesame me as medical records in the following countries: the country of the par cipant (donoror pa ent), country of the data custodian, country of the data processing ins tu on. RECOM-MENDED minimum value is 10 years. Access logs MUST be kept for each BBMRI-ERIC Iden ty atleast on the level of (a) date/ me of beginning of access (signing DTA/MTA), (b) last date/ me ofaccess.
54 http://www.geant.net/uri/dataprotection-code-of-conduct/Pages/default.aspx
53
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 11 / 21
©BBMRI-ER
IC
Status of AAI (3)▶ Current plan for implementa on:
hookup BBMRI-ERIC into eduGAIN – done● pilot per se – interna onal organiza on headquartered
in one countrydevelop BBMRI-ERIC Iden ty
● pilo ng withing AARC and GÉANT VOPaaS● implemented by a Proxy IdP with various backends● iden ty linking/merging● use of BBMRI-ERIC Na onal Nodes for registra on of
“homeless” or “effec vely homeless” (insufficient LoA@ home) users
become one of pilot applica ons for AARC2close collabora on within CORBEL WP5 – Access
● collect, analyze, implement needs of BMSinfrastructures par cipa ng in CORBEL
● collabora on with ELIXIR
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 12 / 21
©BBMRI-ER
IC
Status of AAI (4)
watch closely for STORK successor(s)● government-backed iden ty verificaiton (ve ng) is
important feature● let’s hope for eIDAS
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 13 / 21
©BBMRI-ER
IC
Status of AAI (5)▶ REMS (BBMRI/ELIXIR FI)
par al support for sample/data access nego a on,experiences with pilot deployment in THL Biobank (FI),explored now as a part of ini al work on BBMRI-ERICNego ator.
▶ Organiza onal aspects – interdepenedencies ofinfrastructures
SLA/SLD problem: what if an infra (to be used) is notdependable enough for the other?what if an infra changes its policy?what if there is an issue of conflic ng business models?
● e.g., services are free only for members: and themembers of two infras are not 100% iden cal – thismay happen even during run me
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 14 / 21
©BBMRI-ER
IC
Challenges for AAI
▶ Consistent informa on about LoA (iden tyverifica on|auth instance) from federated AAI
legal review of LoAs – have to withstand hearing atcourt
▶ People↔ country mappingmembers of ERICs are countries
▶ Dealing with less than 100% geographicalinfrastructure overlap
especially if money transfers are expected byeither/both infrastructures, and the consumerinfrastructure is expected to provide services for theirmembers for free
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 15 / 21
©BBMRI-ER
IC
Challenges for AAI
▶ Full members:AustriaBelgiumCzech RepublicEstoniaFinlandFranceGermanyGreeceItalyMaltaNetherlandsNorwaySwedenUnited Kingdom
▶ Observers:PolandSwitzerlandTurkeyIARC
Map based on the BBMRI-ERIC Directory 2.0 as of Dec 23, 2015.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 16 / 21
©BBMRI-ER
IC
Challenges for AAI
▶ Collabora on with industryBBMRI-ERIC infrastructure has industrial users:commercial research brings drugs to market
▶ Dealing with “homeless users”big ins tu ons (e.g., Pfizer :) ) will not deploy full-scaleIdP just because of a few users procuring samples/data
▶ Collabora on with non-European countriescollabora on with Asian countriescollabora on with Africa (e.g., B3AFRICA project)
▶ Affilia on of people to projectsissue of bootstrapping a project in trustworthy way
▶ IdP↔ SP a ribute access nego a on simplifica on
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 17 / 21
©BBMRI-ER
IC
Challenges for AAI
▶ Flexibility of AAI to react to changing needs of usersinfrastructure (customer) inducedinduced by regulatory frameworks
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 18 / 21
©BBMRI-ER
IC
Time Line for AAI
06/2016 BBMRI-ERIC Iden ty with LoA 2/2 forNego ator
▶ ongoing implementa on withAARC/VOPaaS
09/2016 Security toolset release for BBMRI-ERIC(EGI-Engage D6.11)
▶ includes working AAI▶ integra on of federated AAI into
BiobankCloud
06/2017 BBMRI-ERIC Iden ty with LoA 3/2 for Locator▶ ideally with eIDAS backend, if available at
that me
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 19 / 21
©BBMRI-ER
IC
Privacy & Security Requirements ofBBMRI-ERIC IT services
▶ Ini al version of privacy & Security requirementspublished as EGI-Engage Milestone M6.2 document:https://documents.egi.eu/document/2677
requirements are expected to be kept updated as ourunderstanding evolves, regulatory frameworks areupdated, and technologies are becoming availableexpected update: October 2016 – part of Security &Privacy Architecture
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 20 / 21
©BBMRI-ER
IC
Thank you for your a en on!Q?/A!
h p://www.bbmri-eric.eu/[email protected]
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI 21 / 21
©BBMRI-ER
IC
What is BBMRI-ERIC
▶ European Research Infrastructure Consor um tofacilitate access to high-quality biobanks andbiomolecular resources
legal en ty on European level,est. 3 December 2013.
BBMRI-ERIC is today thelargest health-oriented ERICever launched in Europe.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 22 / 40
©BBMRI-ER
IC
IT Architecture of BBMRI-ERIC
▶ Modular architecture with components interconnectedby well defined interfaces
replaceable and reusable, well-defined (small)components,standardized in ideal case, well-defined at least
● this is cri cal as some components may need to beimplemented by the commercial companies (e.g.,components of hospital informa on systems).
✓ ×
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 23 / 40
©BBMRI-ER
IC
Collabora on with ELIXIR
▶ AAIcollec on of needs of all the BMS infras
● using CORBEL WP5 as a frameworkpilots to AARC2 with ELIXIR
▶ Harmoniza on of ontologiesfocus on BBMRI-ERIC on biobank-related ontologies:phenotyping, clinical, biobanks, …using CORBEL WP6 as a framework
▶ So ware development best prac ces▶ GA4GH-ELIXIR Beacons
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 24 / 40
©BBMRI-ER
IC
Examples of BBMRI-ERIC Use Cases
▶ Aggregate view of the infrastructureQ – bio/med researcher: “What biobank could havesamples relevant for my research?”Q – bio/med researcher: “What biobank is capable ofhos ng my samples?”Q – biobanker: “What biobanks are similar to ours?”
⟹ BBMRI-ERIC Directorycurrently in non-public beta version, covering morethan 500 biorepositories,
● currently largest repository contains 30,000,000+samples,
● includes even a few smaller non-human samplecollec ons (but health focus),
Directory 2.0 – released Decmeber 2015
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 25 / 40
©BBMRI-ER
IC
Examples of BBMRI-ERIC Use Cases
▶ Facilitate access to the samples and dataQ: “I need n samples with … specifica ons”researchers do not know what exactly they need
● in terms of the material type and sample quality forgiven experiment
● mul -round nego a on between researchers andbiobankers (resource providers in general)
● … while having hundreds or thousands of biobanksbiobankers are overloaded with fuzzy requestsbiobankers are willing to release samples only forcertain purposes
● Q: “I would like to have these 20 samples from thisgreat cohort of 100,000 par cipants, please.”
● A: “NO!!!”
⟹ BBMRI-ERIC Sample/Data Nego ator
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 26 / 40
©BBMRI-ER
IC
Examples of BBMRI-ERIC Use Cases
▶ Access to sample-level informa on: browsing,searching
Q: “I need to see what sample types are available inmy research field in order to develop new researchprojects.”BBMRI-ERIC is commi ed to ensuring privacy
● differen al privacy approach● famous a acks on privacy: a ack on Massachuse s
Group Insurance Commission by dr. Sweeney, a ack onNe lix user DB by Narayanan and Shma kov
● k-anonymity: each record is undis nguishable from atleast k – 1 other records ⟹ dimensionality curse,1
datasets are sparse in reality● k-anonymity can s ll leak informa on ⟹ l-diversity,
t-closeness1 AGGARWAL, Charu C. On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st interna-
onal conference on Very large data bases.VLDB Endowment, 2005. p. 901-909.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 27 / 40
©BBMRI-ER
IC
Examples of BBMRI-ERIC Use Cases
▶ Access to sample-level informa on: browsing,searching
Q: “I need to see what sample types are available inmy research field in order to develop new researchprojects.”disclosure filters
● not only privacy protec on,● also protec on of resources based on biobankers’
policies,specific support needed for rare diseases
● amplified problem of pa ent iden fica on,● need for cross-biobank pa ent iden fica on.
⟹ BBMRI-ERIC Sample/Data Locator
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 28 / 40
©BBMRI-ER
IC
Examples of BBMRI-ERIC Use Cases
▶ Access to data onlyQ – bioinforma cs: “I need access to theclinical/omics data for my research.”
⟹ BBMRI-ERIC Sample/Data Locator⟹ BBMRI-ERIC Pla orm for Sensi ve Data Processing
BiobankCloud, Mosler/TSD, etc.
▶ Measuring impact of bioresourcesQ - biobanker, funding organiza ons: “We need toknow the impact of a bioresource.”
⟹ BRIF now adopted by BBMRI-ERICBioResource Impact Factor
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 29 / 40
©BBMRI-ER
IC
Principal Components
▶ BBMRI-ERIC Directoryaggregate informa on about available resources:biobanks & collec ons,even achieving agreement on such minimum datastructure has not been simple :) – ongoing updates toMIABIS 2.0 standard,beta version of BBMRI-ERIC Directory already used bypilot users as of May 2015.
▶ BBMRI-ERIC Sample/Data Nego atorbrokering of samples between researchers andbiobankers,efficientM ∶ N communica on tool for largeM and N.
▶ BBMRI-ERIC Sample/Data Locatorfederalized architecture with distributed queries,privacy and security by design to avoid vulnerability toprivacy a acks.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 30 / 40
©BBMRI-ER
IC
Principal Components
▶ Tools to support na onal-level and local-levelinfrastructures
reference tools for biobanks and na onal nodes toconnect to the European infrastructure,registry of BBMRI-ERIC endorsed tools.
▶ Data harmoniza on service + metadata registriesontologies registry, transla on/harmoniza on recipes.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 31 / 40
©BBMRI-ER
IC
Principal Components
▶ Extrac on of structured data from unstructured clinicalrecords
this is one of the major problems which limitsperformance of biobanks at the moment,involves complex natural-language processing andmachine learning,language and region specifics ⟹ genera ng data indifferent ontologies and different structures
● accompanying data o en comes from health caresystems.
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 32 / 40
©BBMRI-ER
IC
Collabora on with Other Infrastructures
BBMRI-ERIC
eInfrastructures (GÉANT, EGI, EUDAT,…)
INSTRUCT INFRAFRONTIER EATRIS ECRIN
Target Id Target Val Hit Lead Lead Opt Preclinic Phase I Phase II Phase III
Research Discovery Development
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 33 / 40
©BBMRI-ER
IC
Status of Cloud Compu ng (1)
Default is private clouds in biobanks: but can we go beyondthat?
▶ Private clouds piloted by BiobankCloudfocus on solving mul -tenancy problem(person+project)prototyped with Apache jclouds® interfacessupport for distributed encryp on to store databeyond biobanks
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 34 / 40
©BBMRI-ER
IC
Status of Cloud Compu ng (2)▶ BiobankCloud architecture
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 35 / 40
©BBMRI-ER
IC
Status of Cloud Compu ng (3)▶ BiobankCloud architecture
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 36 / 40
©BBMRI-ER
IC
Status of Cloud Compu ng (4)▶ BBMRI Competence Center in EGI-Engage
basic scenario: private cloud based on EGI-Engageplaform for BiobankCloud processing genomics data,later phase: explore what is possible beyond that.
▶ Trusted/secure data sharing pla ormscollabora on with TSD, MOSLER/TSD 2.0, and others,known to work in some legisla ve frameworks (e.g.,Nordic countries).
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 37 / 40
©BBMRI-ER
IC
Status of Cloud Compu ng (5)▶ Use of 3 party providers
extending no on of “private clouds” to ingestcontracted clouds: under what condi ons?impact of GDPR – responsibility is now both with dataowner and data processorwhat is the impact in various legal frameworks – GDPRactually does not harmonize itwhat level of cer fica on will be required if acceptableat all?
● now looking into ISO 27001/27018 cer fica onsexploring also as a part of PhenoMeNalinput for European Open Science Cloud
● … if it will also become a cloud in technical sense :)
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 38 / 40
©BBMRI-ER
IC
Time Line for Clouds
09/2016 Security toolset release for BBMRI-ERIC(EGI-Engage D6.11)
▶ integra on of federated AAI intoBiobankCloud
08/2017 Evaluated cloud environment anddemonstrator of analysis workflow forbiobank studies
▶ demonstrator▶ minimum: private cloud using EGI cloud
stack inside BBMRI.{cz,nl,se} biobanks
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 39 / 40
©BBMRI-ER
IC
A Few Further Notes on Clouds
▶ And some more general notes… if there is future cloudmarketplace (e.g., as a part of EOSC)
research ins tu ons must balance CAPEX/OPEX,research ins tu ons or downstream researchinfrastructures must be given access to plurality ofservices (incl. brokering services),cloud brokering/marketplace ini a ves need to
● remain neutral and lightweight,● be non-compe ng with upstream providers and
downstream users,● be standard-compliant and thus also subject to
compe on.clarify role of academic vs. commercial cloud providers
Holub P.⋅ BBMRI-ERIC⋅ CORBEL/AARC/AARC2 AAI Spare slide: 40 / 40