grant agreement no. 675451 compbiomed...d6.2 – deployment of project informatics platform pu page...

14
D6.2 – Deployment of Project Informatics Platform PU Page 1 Version 1.2 “This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451Grant agreement no. 675451 CompBioMed Research and Innovation Action H2020-EINFRA-2015-1 Topic: Centres of Excellence for Computing Applications D6.2 - Deployment of Project Informatics Platform Work Package: 6 Due date of deliverable: Month 12 Actual submission date: 29 / September / 2017 Start date of project: October, 01 2016 Duration: 36 months Lead beneficiary for this deliverable: UCL Contributors: UCL Project co-funded by the European Commission within the H2020 Programme (2014-2020) Dissemination Level PU Public YES CO Confidential, only for members of the consortium (including the Commission Services) CI Classified, as referred to in Commission Decision 2001/844/EC

Upload: others

Post on 17-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 1 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

Grantagreementno.675451

CompBioMed

ResearchandInnovationActionH2020-EINFRA-2015-1

Topic:CentresofExcellenceforComputingApplications

D6.2-DeploymentofProjectInformaticsPlatform

WorkPackage: 6Duedateofdeliverable: Month12Actualsubmissiondate: 29/September/2017Startdateofproject: October,012016 Duration:36monthsLeadbeneficiaryforthisdeliverable:UCLContributors:UCL

Projectco-fundedbytheEuropeanCommissionwithintheH2020Programme(2014-2020)

DisseminationLevelPU Public YES

CO Confidential,onlyformembersoftheconsortium(includingtheCommissionServices)

CI Classified,asreferredtoinCommissionDecision2001/844/EC

Page 2: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 2 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

DisclaimerThe content of this deliverable does not reflect the official opinion of the European Union.Responsibilityfortheinformationandviewsexpressedhereinliesentirelywiththeauthor(s).TableofContents

1 VersionLog...............................................................................................................42 Contributors.............................................................................................................43 DefinitionandAcronyms..........................................................................................54 Introduction.............................................................................................................65 Requirements...........................................................................................................66 PlatformDescription................................................................................................86.1 EUDATServices.................................................................................................96.2 LocallyDeployedServices.................................................................................9

7 Deployment............................................................................................................107.1 SearchingforData..........................................................................................107.2 Community.....................................................................................................117.3 DataUpload....................................................................................................127.4 RESTAPI..........................................................................................................14

8 Conclusions............................................................................................................14

ListofFiguresFigure1. (A)Percentageofpeoplewhowanttobeabletosharedataoutsideoftheproject

and(B)percentageofpeoplewhorequirealongtermpreservationservice.............................7

Figure2. Percentage of projects that associate (A) metadata and (B) persistent identifiers

withtheirdata..............................................................................................................................7

Figure3. Percentageofprojectsthat(A)haveacoherentdataarchitectureand(B)replicate

theirmetadata.............................................................................................................................7

Figure4. Percentage of projects that (A) have sensitivemetadata (B) canmake their data

availableviaopenaccess.............................................................................................................8

Figure5. Architecture of the EUDAT system. The B2SHARE service is deployed on

CompBioMed resources, and is integrated and interacts with other services in the wider

EUDATinfrastructure...................................................................................................................9

Page 3: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 3 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

Figure6. Eachdataobjectisassignedit’sownpage,whichdisplaysitsdetailsandallowsthe

usertodownloadtheassociatedfiles.Thesearchprocessallowsuserstoquicklyfindobjects.

11

Figure7. Thehomepageofthewebinterface,showingtheavailablecommunities............12

Figure8. The first stage of the data upload process – entering a title and selecting a

community.................................................................................................................................13

Figure9. TheCompBioMedcommunitymetadatacaptureforminB2SHARE.......................13

Page 4: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 4 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

1 VersionLogVersion Date Releasedby NatureofChange

V1.1 04/09/2017 StefanZasada FirstDraft

V1.2 18/09/2017 StefanZasada Aftercommentsfromreviewers

2 ContributorsName Institution Role

StefanZasada UCL Author

NargesZarrabi SURFsara Reviewer

DavidWright UCL Reviewer

PeterVCoveney UCL Reviewer

EmilyLumley UCL Reviewer

Page 5: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 5 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

3 DefinitionandAcronymsAcronyms Definitions

B2ACCESS TheEUDATusermanagementservice

B2DROP TheEUDATcollaborativedataworkspace

B2FIND TheEUDATmetadatasearchservice

B2HANDLE TheEUDATPIDassignmentservice

B2SAFE TheEUDATdatareplicationservice

B2SHARE TheEUDATdataarchivingandsearchinginterface

B2STAGE TheEUDATservicetomovedatatoHPCresources

CDI CollaborativeDataInfrastructure

EUDAT AnEUfundedcollaborativedatainfrastructure

PID PersistentIdentifier

Page 6: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 6 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

4 IntroductionTheworkdescribedinthisdeliverableisaresultofCompBioMedtask6.4-DevelopandDeployanInformaticsPlatformwhichwillstoreallthedatacollectedandprocessedwithinCompBioMed.Originally,thetaskhadplannedtoleadtothedeploymentofaninstanceofthep-medicineinformaticsplatformtoallowbiomedicalresearcherstoanalysetheirdata,performqueriesandextractcontenttoinitiatetheirmodellingandsimulationactivities.Alldatawithinthedatawarehousethatresideatthecoreofthisinformaticsplatformarefullyanonymised.However,thisactivitywassubjecttochangebasedontheuserneedsanalysisconductedbyworkpackages2,5and6toassessthedatarequirementsoftheprojectresearchersandassociatepartners.Theresultsofthisanalysis,presentedinthenextsection,havegreatlyinfluencedthedecisionsregardingthedeploymentofthisplatform,describedinthefinalsectionsofthisdocument.ThesystemdescribedinthisdeliverableisalsodesignedtomeetthedetailedrequirementsoutlinedindeliverableD1.3–DataManagementPlan,towhichthereaderisreferred.

5 RequirementsCompBioMedhasfourseparatetasks/deliverablesthatdependonunderstandingtheproject’sdatarequirements.Tounderstandtherequirementsoftheproject,withintheprojectadataworkinggroupwasformedtosurveytherequirementsoftheconsortium.ThissurveywasdeliveredviaSurveyMonkeyandcomprisedfiftyquestionssplitintofourcategories: Backgroundprojectquestions Non-simulationdata(e.g.usedtobuildmodels) Simulationdata GeneraldataquestionsThesurveyreceived27responsesfromcoreprojectpartnerworkflowsandalsoassociatepartners,andthisinformationhasbeenusedtoinformthedeploymentofthedatamanagementplatform.Twelveresponseswerereceivedfromcoreworkflowpartners,and15fromassociatepartnerswhohavejoinedtheconsortium.Mostrespondentswereworkingwithfilebaseddataratherthanstructureddatabases,andidentifiedanopportunityforCompBioMedtoprovidestorageandcollaborationservicesfordatasharing.Typicaldatasetsrangedfrom100MB-10GB,andthetotalvolumeofdataprojectpartnerswanttostoreisaround20-25TBwithsignificantgrowthexpected.Manyrespondentsalreadyhadtheirowndatamanagementarrangementsinplace,but25%ofrespondentsdidnotwanttocontinueusingtheirexistingdatastoragesystemsandwerelookingforCompBioMedtoprovideaservice.Additionally,byadoptingarobustbestpracticesystem,CompBioMedislikelytobeabletoassumeleadershipwithinthecommunityintermsofdatastorageand

Page 7: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 7 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

archiving.ThebasicdatarequirementsintermsofpreservationandsharingaresummarizedinFigure1.

A

B

Figure1. (A)Percentageofpeoplewhowanttobeabletosharedataoutsideoftheproject

and(B)percentageofpeoplewhorequirealongtermpreservationserviceOursurveyalsoaskedusersaboutthecurrentstateoftheartregardingtheirdatamanagementprocessesandprocedures.Wefoundthatmostrespondentsdidnotcurrentlyadopttoolssuchaspersistentidentifiers(PIDs)inordertoreferencedataobjects.ThedetailscanbefoundinFigure2-Figure4.

A

B

Figure2. Percentageofprojectsthatassociate(A)metadataand(B)persistentidentifierswiththeirdata

A

B

Figure3. Percentageofprojectsthat(A)haveacoherentdataarchitectureand(B)replicate

theirmetadata

Page 8: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 8 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

A

B

Figure4. Percentageofprojectsthat(A)havesensitivedataand(B)canmaketheirdata

availableviaopenaccessThefactthatveryfewoftheresearchcommunitiesrepresentedbyCompBioMedaremakinguseofsensitivedatawithstrongprivacyrequirementsmeansthatwetookadesigndecisiontoleavethemanagementofconsent,anonymisationandprivacyissuestothedataowners,andhavenotattemptedtoimplementadatamanagementplatformthatcanenforcesuchpoliciesasmandatory.

6 PlatformDescriptionTherequirementsoutlinedintheprevioussectionpointtoasystemcapableofstoringarbitrarydatainmanydifferentfileformats,whichcanbesharedwithotherusersandpreservedforthelongterm.It’sbeyondthescopeoftheCompBioMedprojecttodevelopsuchasystemfromscratch,butfortunatelyitisalsounnecessary.TheEUDATproject1,startedin2011,aimstoprovideEurope’sscientificandresearchcommunitieswithasustainablepan-Europeaninfrastructureforimprovedaccesstoscientificdata.Burgeoningvolumesofvaluableandcomplexdatacreatenewchallengesrelatedtodatamanagement,accessandpreservation.EUDATaimstoaddressthesechallengesandexploittheopportunitiesusingitsvisionofaCollaborativeDataInfrastructure.EUDAT,therefore,existstoworkwithprojectssuchasCompBioMed,andprovidesanattractivemeansforsuchprojectstomeettheirdatamanagementandpreservationobligations.EUDATcomprisesasetofservicesthatcanbecomposedindifferentwaystomeettheobjectivesoftheproject.SomeoftheseservicesareruncentrallybyEUDAT,whereasothersexistassoftwarethatcanbedownloadedanddeployedbyprojectsthatuseEDUAT.Thismeansthataprojectcanadoptamulti-serviceecosystemwheresomeservicesaredeployedwithintheprojectandlinkedtootherservicesinthewiderEUDATinfrastructure.ThatisthemodelthatwehaveadoptedfortheCompBioMeddatamanagementplatform.WedescribethemainEUDATservicesweuseinthissection,andthelocaldeploymentoftheB2SHAREserviceinthenext.Figure5showshowthesevariousservicesinteract.

1 http://www.eudat.eu

Page 9: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 9 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

Figure5. ArchitectureoftheEUDATsystem.TheB2SHAREserviceisdeployedon

CompBioMedresources,andisintegratedandinteractswithotherservicesinthewiderEUDATinfrastructure.

6.1 EUDATServices

Assigningauniquedatasetreferencemakesitpossibletorefertodataincitationsandenhancesdiscoverability.Italsoprovidestheuserwithclearversioningcapabilitiesfordatasets.WeusethecentralisedB2HANDLEserviceprovidedbyEUDATtoassignpersistentidentifierstoallofthedatasetsthatarearchivedinourproject.CompBioMedalsomakesuseoftheB2DROPserviceprovidedbyEUDATforsharinglivedatainternallyintheproject,whichwilleasethetransitionofmakingdataopenlyavailableinfuture.B2DROPisatooltostoreandexchangedatawithcollaboratorsandtokeepdatasynchronizedandup-to-date.CompBioMedtakesadvantageofthefreestoragespaceprovidedforresearchdatawithintheB2DROPframework.AlldatahostedwithintheEUDATCDIisadvertisedthroughthecentralB2FINDcatalogueandassignedapersistentidentifier.TheB2FINDserviceisawebportalallowingresearcherstoeasilyfindandaccesscollectionsofscientificdata,andallowingthemtoaccessthedatausingawebbrowser.CompBioMedisintheprocessofdevelopingacommunitymetadataschematodescribethedatasetsgeneratedbytheproject.

6.2 LocallyDeployedServices

TheEUDATB2SHAREserviceallowsdatasharedopenlyorkeptprivate.Regardlessofwhetherdepositeddataaremadeopenorkeptprivate,metadatarecordssubmittedaspartofadatadepositaremadefreelyavailableforharvestviaOAI-PMHprotocols.AccessibledataaremadeavailabledirectlytousersofEUDATCDIservicesthroughgraphicaluserinterfacesand

Page 10: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 10 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

applicationprogramminginterfaces.WehavedeployedourowninstanceoftheB2SHAREsoftware,linkedtothewiderEUDATinfrastructurethroughtheB2HANDLEandB2FINDservices,topublishdataforthird-partyuse.

7 DeploymentAninstanceoftheB2SHAREservicehasbeendeployedonresourcesatCompBioMedpartnerUCL,with100TBofstorageallocatedtotheprojectfromUCL’sresources.B2SHAREisagraphical,web-basedtool,whichisdesignedtobeeasytouse,andB2SHAREalsoexposesanHTTPRESTAPI.Thebasicoperationsoftheinterfacearedescribedbelow.Theserviceisavailableviahttp://b2share.compbiomed.eu.TheserviceissecuredviaEUDAT’sOpenIDprovider(viaB2ACCESS),asCompBioMedatpresentdoesnotrunitsownidentityproviderservice.Thismeans,atpresent,thatCompBioMedmustregisterforanaccountwithEUDAT.

7.1 SearchingforDataBothregisteredandunregistereduserscansearchfordata.Thetextenteredcanbepartofatitle,keyword,abstractoranyothermetadata.Unregistereduserscanonlysearchfordatasetsthatarepubliclyaccessible.AdvancedsearchescanbeperformedbyclickingtheSearchbutton,thenenteringtheadditionalsearchcriteriaonthepagethatisshown.Oncearecordhasbeenfoundusingthesearchfacility,theusercanclickonitsnametodisplaythedata’spage.Thispageshowsthemetadataandfilesassociatedwiththedataobject.Foreachfile,thefilesize,checksumandPIDareshown(seeFigure6).TheownerofarecordisabletoeditthemetadatabyclickingontheEditrecordbutton.

Page 11: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 11 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

Figure6. Eachdataobjectisassigneditsownpage,whichdisplaysitsdetailsandallowsthe

usertodownloadtheassociatedfiles.Thesearchprocessallowsuserstoquicklyfindobjects.AusercanalsoclickontheirusernameoremailaddressonthefrontpageandselectProfiletogototheprofileviewoftheiraccount.Fromheretheycanfindlinkstoanoverviewofalltheirpublishedordraftdatarecords.AttheendofthepagecurrentAPItokensandnewtokenscanbegeneratedonrequest.

7.2 CommunityDatauploadedtoB2SHAREisorganisedbycommunity,whereacommunityrepresentsaspecificmetadataschemausedtoannotatethedataobjectsstoredwithinit.Abuttonisgeneratedwithinthelandingpageoftheinterfaceforthecommunity(seeFigure7),whichallowsuserswhoareloggedintobrowsealldataobjectsthathavebeenuploadedtothatcommunity.

Page 12: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 12 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

Figure7. Thehomepageofthewebinterface,showingtheavailablecommunities

7.3 DataUploadOnlyregisteredusersarepermittedtouploadnewrecordstoB2SHARE.ClickingtheUploadlinkinthemaininterfaceopensthefirstofathreestageprocessrequiredtouploaddata.Thestepsareasfollows:

1. Enterthetitleforthedataobjectthatisbeinguploaded,thenselecttheCommunitythatthedatabelongsto.InthecaseofCompBioMed,usersshouldselecttheCompBioMedoption(seeFigure8).ClickonCreateDraftRecord.

2. Nexttheusercaneitherdraganddropfilesfromtheirlocalmachinetothewebinterface,orimportfilesstoredintheirB2DROPaccount.

3. Finallytheuserhastofillinthebasicmetadatafields.Thesefieldsdependonthe

communitychosen.InthecaseoftheCompBioMedcommunity,basicinformationsuchastitle,description(andtype)andwhetherthedataisopenaccessaremandatoryfields,whileotherinformationsuchasthecreator,licence,URL,embargodate,keywordsandstudyIDareoptional.Hoveringthemouse-pointeroverthetextfieldwillshowadescriptionofthepurposeofthefield.Thelicencecanalsobeselectedthroughabuilt-inwizard.SeeFigure9fordetails.

Page 13: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 13 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

Figure8. Thefirststageofthedatauploadprocess–enteringatitleandselectinga

community

Figure9. TheCompBioMedcommunitymetadatacaptureforminB2SHARE

TickingSubmitdraftforpublicationwillensurethattheuploadeddataisassignedapersistentidentifier(PID).Thiscanthenbeusedtorefertothedatainfuture.

Page 14: Grant agreement no. 675451 CompBioMed...D6.2 – Deployment of Project Informatics Platform PU Page 2 Version 1.2 “This project has received funding from the European Union’s Horizon

D6.2 – Deployment of Project Informatics Platform

PU Page 14 Version 1.2

“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“

7.4 RESTAPIThe B2HARE HTTP REST API can be used to interact with B2SHARE via external services orapplications,forexampleforintegratingwithotherwebsites(researchcommunityportals)orforuploadingordownloading largedatasets thatarenoteasilyhandledviaawebbrowser.ThisAPIcanalsobeusedformetadataharvesting.This is particularly useful in the context of CompBioMed workflows, since it means thatworkflowsareautomaticallyabletoingestdataintheCompBioMedB2SHAREserviceastheyarecreated,thusalleviatingtheburdenontheuser.Only authenticated users can use the API. Each HTTP request to the server must pass anaccess_tokenparameterthat identifiestheuser.Theaccess_tokenisanopaquestringwhichcan be created in the user profile section of the B2SHARE web user interface. B2SHARE’saccesstokensfollowtheOAuth2.0standard,andallowAPIcallstobemadeonauser’sbehalf.Togetanaccesstoken,ausermustlogintotheB2SHAREwebinterfaceandclickonhisorherusername.OntheUserProfilepage,gototheAPITokenssection,enteratokenidentificationname (e.g.my-workflow) and clickNewToken. Thiswill create an access_token,which canthenbeusedwhenmakingAPIcalls.

8 ConclusionsThedeploymentofB2SHAREmeetsaspecificrequirementoftheCompBioMedprojecttobeabletoarchiveandshareresearchdata.Thebenefitsaretwofold:theserviceiseffectivelyrunbyCompBioMed,onCompBioMedmanagedresources.However,itintegrateswiththewiderEUDATinfrastructureandcanleverageEUDATservicessuchasB2HANDLEandB2FIND,greatlyimprovingthesustainabilityeffortsoftheprojecttopublishandpreserveitsdataoutputs.WeareintheprocessofdevelopingacommunitymetadatastandardtodescribethedataproducebythediverseresearchstrandsofCompBioMed.WealsoplantoengageinfurtherworktoautomatetheingestionofdataintoB2SHAREdirectlyfromprojectworkflows,viatheB2SHAREAPI.