grant agreement no. 675451 compbiomed...d6.2 – deployment of project informatics platform pu page...
TRANSCRIPT
D6.2 – Deployment of Project Informatics Platform
PU Page 1 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
Grantagreementno.675451
CompBioMed
ResearchandInnovationActionH2020-EINFRA-2015-1
Topic:CentresofExcellenceforComputingApplications
D6.2-DeploymentofProjectInformaticsPlatform
WorkPackage: 6Duedateofdeliverable: Month12Actualsubmissiondate: 29/September/2017Startdateofproject: October,012016 Duration:36monthsLeadbeneficiaryforthisdeliverable:UCLContributors:UCL
Projectco-fundedbytheEuropeanCommissionwithintheH2020Programme(2014-2020)
DisseminationLevelPU Public YES
CO Confidential,onlyformembersoftheconsortium(includingtheCommissionServices)
CI Classified,asreferredtoinCommissionDecision2001/844/EC
D6.2 – Deployment of Project Informatics Platform
PU Page 2 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
DisclaimerThe content of this deliverable does not reflect the official opinion of the European Union.Responsibilityfortheinformationandviewsexpressedhereinliesentirelywiththeauthor(s).TableofContents
1 VersionLog...............................................................................................................42 Contributors.............................................................................................................43 DefinitionandAcronyms..........................................................................................54 Introduction.............................................................................................................65 Requirements...........................................................................................................66 PlatformDescription................................................................................................86.1 EUDATServices.................................................................................................96.2 LocallyDeployedServices.................................................................................9
7 Deployment............................................................................................................107.1 SearchingforData..........................................................................................107.2 Community.....................................................................................................117.3 DataUpload....................................................................................................127.4 RESTAPI..........................................................................................................14
8 Conclusions............................................................................................................14
ListofFiguresFigure1. (A)Percentageofpeoplewhowanttobeabletosharedataoutsideoftheproject
and(B)percentageofpeoplewhorequirealongtermpreservationservice.............................7
Figure2. Percentage of projects that associate (A) metadata and (B) persistent identifiers
withtheirdata..............................................................................................................................7
Figure3. Percentageofprojectsthat(A)haveacoherentdataarchitectureand(B)replicate
theirmetadata.............................................................................................................................7
Figure4. Percentage of projects that (A) have sensitivemetadata (B) canmake their data
availableviaopenaccess.............................................................................................................8
Figure5. Architecture of the EUDAT system. The B2SHARE service is deployed on
CompBioMed resources, and is integrated and interacts with other services in the wider
EUDATinfrastructure...................................................................................................................9
D6.2 – Deployment of Project Informatics Platform
PU Page 3 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
Figure6. Eachdataobjectisassignedit’sownpage,whichdisplaysitsdetailsandallowsthe
usertodownloadtheassociatedfiles.Thesearchprocessallowsuserstoquicklyfindobjects.
11
Figure7. Thehomepageofthewebinterface,showingtheavailablecommunities............12
Figure8. The first stage of the data upload process – entering a title and selecting a
community.................................................................................................................................13
Figure9. TheCompBioMedcommunitymetadatacaptureforminB2SHARE.......................13
D6.2 – Deployment of Project Informatics Platform
PU Page 4 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
1 VersionLogVersion Date Releasedby NatureofChange
V1.1 04/09/2017 StefanZasada FirstDraft
V1.2 18/09/2017 StefanZasada Aftercommentsfromreviewers
2 ContributorsName Institution Role
StefanZasada UCL Author
NargesZarrabi SURFsara Reviewer
DavidWright UCL Reviewer
PeterVCoveney UCL Reviewer
EmilyLumley UCL Reviewer
D6.2 – Deployment of Project Informatics Platform
PU Page 5 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
3 DefinitionandAcronymsAcronyms Definitions
B2ACCESS TheEUDATusermanagementservice
B2DROP TheEUDATcollaborativedataworkspace
B2FIND TheEUDATmetadatasearchservice
B2HANDLE TheEUDATPIDassignmentservice
B2SAFE TheEUDATdatareplicationservice
B2SHARE TheEUDATdataarchivingandsearchinginterface
B2STAGE TheEUDATservicetomovedatatoHPCresources
CDI CollaborativeDataInfrastructure
EUDAT AnEUfundedcollaborativedatainfrastructure
PID PersistentIdentifier
D6.2 – Deployment of Project Informatics Platform
PU Page 6 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
4 IntroductionTheworkdescribedinthisdeliverableisaresultofCompBioMedtask6.4-DevelopandDeployanInformaticsPlatformwhichwillstoreallthedatacollectedandprocessedwithinCompBioMed.Originally,thetaskhadplannedtoleadtothedeploymentofaninstanceofthep-medicineinformaticsplatformtoallowbiomedicalresearcherstoanalysetheirdata,performqueriesandextractcontenttoinitiatetheirmodellingandsimulationactivities.Alldatawithinthedatawarehousethatresideatthecoreofthisinformaticsplatformarefullyanonymised.However,thisactivitywassubjecttochangebasedontheuserneedsanalysisconductedbyworkpackages2,5and6toassessthedatarequirementsoftheprojectresearchersandassociatepartners.Theresultsofthisanalysis,presentedinthenextsection,havegreatlyinfluencedthedecisionsregardingthedeploymentofthisplatform,describedinthefinalsectionsofthisdocument.ThesystemdescribedinthisdeliverableisalsodesignedtomeetthedetailedrequirementsoutlinedindeliverableD1.3–DataManagementPlan,towhichthereaderisreferred.
5 RequirementsCompBioMedhasfourseparatetasks/deliverablesthatdependonunderstandingtheproject’sdatarequirements.Tounderstandtherequirementsoftheproject,withintheprojectadataworkinggroupwasformedtosurveytherequirementsoftheconsortium.ThissurveywasdeliveredviaSurveyMonkeyandcomprisedfiftyquestionssplitintofourcategories: Backgroundprojectquestions Non-simulationdata(e.g.usedtobuildmodels) Simulationdata GeneraldataquestionsThesurveyreceived27responsesfromcoreprojectpartnerworkflowsandalsoassociatepartners,andthisinformationhasbeenusedtoinformthedeploymentofthedatamanagementplatform.Twelveresponseswerereceivedfromcoreworkflowpartners,and15fromassociatepartnerswhohavejoinedtheconsortium.Mostrespondentswereworkingwithfilebaseddataratherthanstructureddatabases,andidentifiedanopportunityforCompBioMedtoprovidestorageandcollaborationservicesfordatasharing.Typicaldatasetsrangedfrom100MB-10GB,andthetotalvolumeofdataprojectpartnerswanttostoreisaround20-25TBwithsignificantgrowthexpected.Manyrespondentsalreadyhadtheirowndatamanagementarrangementsinplace,but25%ofrespondentsdidnotwanttocontinueusingtheirexistingdatastoragesystemsandwerelookingforCompBioMedtoprovideaservice.Additionally,byadoptingarobustbestpracticesystem,CompBioMedislikelytobeabletoassumeleadershipwithinthecommunityintermsofdatastorageand
D6.2 – Deployment of Project Informatics Platform
PU Page 7 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
archiving.ThebasicdatarequirementsintermsofpreservationandsharingaresummarizedinFigure1.
A
B
Figure1. (A)Percentageofpeoplewhowanttobeabletosharedataoutsideoftheproject
and(B)percentageofpeoplewhorequirealongtermpreservationserviceOursurveyalsoaskedusersaboutthecurrentstateoftheartregardingtheirdatamanagementprocessesandprocedures.Wefoundthatmostrespondentsdidnotcurrentlyadopttoolssuchaspersistentidentifiers(PIDs)inordertoreferencedataobjects.ThedetailscanbefoundinFigure2-Figure4.
A
B
Figure2. Percentageofprojectsthatassociate(A)metadataand(B)persistentidentifierswiththeirdata
A
B
Figure3. Percentageofprojectsthat(A)haveacoherentdataarchitectureand(B)replicate
theirmetadata
D6.2 – Deployment of Project Informatics Platform
PU Page 8 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
A
B
Figure4. Percentageofprojectsthat(A)havesensitivedataand(B)canmaketheirdata
availableviaopenaccessThefactthatveryfewoftheresearchcommunitiesrepresentedbyCompBioMedaremakinguseofsensitivedatawithstrongprivacyrequirementsmeansthatwetookadesigndecisiontoleavethemanagementofconsent,anonymisationandprivacyissuestothedataowners,andhavenotattemptedtoimplementadatamanagementplatformthatcanenforcesuchpoliciesasmandatory.
6 PlatformDescriptionTherequirementsoutlinedintheprevioussectionpointtoasystemcapableofstoringarbitrarydatainmanydifferentfileformats,whichcanbesharedwithotherusersandpreservedforthelongterm.It’sbeyondthescopeoftheCompBioMedprojecttodevelopsuchasystemfromscratch,butfortunatelyitisalsounnecessary.TheEUDATproject1,startedin2011,aimstoprovideEurope’sscientificandresearchcommunitieswithasustainablepan-Europeaninfrastructureforimprovedaccesstoscientificdata.Burgeoningvolumesofvaluableandcomplexdatacreatenewchallengesrelatedtodatamanagement,accessandpreservation.EUDATaimstoaddressthesechallengesandexploittheopportunitiesusingitsvisionofaCollaborativeDataInfrastructure.EUDAT,therefore,existstoworkwithprojectssuchasCompBioMed,andprovidesanattractivemeansforsuchprojectstomeettheirdatamanagementandpreservationobligations.EUDATcomprisesasetofservicesthatcanbecomposedindifferentwaystomeettheobjectivesoftheproject.SomeoftheseservicesareruncentrallybyEUDAT,whereasothersexistassoftwarethatcanbedownloadedanddeployedbyprojectsthatuseEDUAT.Thismeansthataprojectcanadoptamulti-serviceecosystemwheresomeservicesaredeployedwithintheprojectandlinkedtootherservicesinthewiderEUDATinfrastructure.ThatisthemodelthatwehaveadoptedfortheCompBioMeddatamanagementplatform.WedescribethemainEUDATservicesweuseinthissection,andthelocaldeploymentoftheB2SHAREserviceinthenext.Figure5showshowthesevariousservicesinteract.
1 http://www.eudat.eu
D6.2 – Deployment of Project Informatics Platform
PU Page 9 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
Figure5. ArchitectureoftheEUDATsystem.TheB2SHAREserviceisdeployedon
CompBioMedresources,andisintegratedandinteractswithotherservicesinthewiderEUDATinfrastructure.
6.1 EUDATServices
Assigningauniquedatasetreferencemakesitpossibletorefertodataincitationsandenhancesdiscoverability.Italsoprovidestheuserwithclearversioningcapabilitiesfordatasets.WeusethecentralisedB2HANDLEserviceprovidedbyEUDATtoassignpersistentidentifierstoallofthedatasetsthatarearchivedinourproject.CompBioMedalsomakesuseoftheB2DROPserviceprovidedbyEUDATforsharinglivedatainternallyintheproject,whichwilleasethetransitionofmakingdataopenlyavailableinfuture.B2DROPisatooltostoreandexchangedatawithcollaboratorsandtokeepdatasynchronizedandup-to-date.CompBioMedtakesadvantageofthefreestoragespaceprovidedforresearchdatawithintheB2DROPframework.AlldatahostedwithintheEUDATCDIisadvertisedthroughthecentralB2FINDcatalogueandassignedapersistentidentifier.TheB2FINDserviceisawebportalallowingresearcherstoeasilyfindandaccesscollectionsofscientificdata,andallowingthemtoaccessthedatausingawebbrowser.CompBioMedisintheprocessofdevelopingacommunitymetadataschematodescribethedatasetsgeneratedbytheproject.
6.2 LocallyDeployedServices
TheEUDATB2SHAREserviceallowsdatasharedopenlyorkeptprivate.Regardlessofwhetherdepositeddataaremadeopenorkeptprivate,metadatarecordssubmittedaspartofadatadepositaremadefreelyavailableforharvestviaOAI-PMHprotocols.AccessibledataaremadeavailabledirectlytousersofEUDATCDIservicesthroughgraphicaluserinterfacesand
D6.2 – Deployment of Project Informatics Platform
PU Page 10 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
applicationprogramminginterfaces.WehavedeployedourowninstanceoftheB2SHAREsoftware,linkedtothewiderEUDATinfrastructurethroughtheB2HANDLEandB2FINDservices,topublishdataforthird-partyuse.
7 DeploymentAninstanceoftheB2SHAREservicehasbeendeployedonresourcesatCompBioMedpartnerUCL,with100TBofstorageallocatedtotheprojectfromUCL’sresources.B2SHAREisagraphical,web-basedtool,whichisdesignedtobeeasytouse,andB2SHAREalsoexposesanHTTPRESTAPI.Thebasicoperationsoftheinterfacearedescribedbelow.Theserviceisavailableviahttp://b2share.compbiomed.eu.TheserviceissecuredviaEUDAT’sOpenIDprovider(viaB2ACCESS),asCompBioMedatpresentdoesnotrunitsownidentityproviderservice.Thismeans,atpresent,thatCompBioMedmustregisterforanaccountwithEUDAT.
7.1 SearchingforDataBothregisteredandunregistereduserscansearchfordata.Thetextenteredcanbepartofatitle,keyword,abstractoranyothermetadata.Unregistereduserscanonlysearchfordatasetsthatarepubliclyaccessible.AdvancedsearchescanbeperformedbyclickingtheSearchbutton,thenenteringtheadditionalsearchcriteriaonthepagethatisshown.Oncearecordhasbeenfoundusingthesearchfacility,theusercanclickonitsnametodisplaythedata’spage.Thispageshowsthemetadataandfilesassociatedwiththedataobject.Foreachfile,thefilesize,checksumandPIDareshown(seeFigure6).TheownerofarecordisabletoeditthemetadatabyclickingontheEditrecordbutton.
D6.2 – Deployment of Project Informatics Platform
PU Page 11 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
Figure6. Eachdataobjectisassigneditsownpage,whichdisplaysitsdetailsandallowsthe
usertodownloadtheassociatedfiles.Thesearchprocessallowsuserstoquicklyfindobjects.AusercanalsoclickontheirusernameoremailaddressonthefrontpageandselectProfiletogototheprofileviewoftheiraccount.Fromheretheycanfindlinkstoanoverviewofalltheirpublishedordraftdatarecords.AttheendofthepagecurrentAPItokensandnewtokenscanbegeneratedonrequest.
7.2 CommunityDatauploadedtoB2SHAREisorganisedbycommunity,whereacommunityrepresentsaspecificmetadataschemausedtoannotatethedataobjectsstoredwithinit.Abuttonisgeneratedwithinthelandingpageoftheinterfaceforthecommunity(seeFigure7),whichallowsuserswhoareloggedintobrowsealldataobjectsthathavebeenuploadedtothatcommunity.
D6.2 – Deployment of Project Informatics Platform
PU Page 12 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
Figure7. Thehomepageofthewebinterface,showingtheavailablecommunities
7.3 DataUploadOnlyregisteredusersarepermittedtouploadnewrecordstoB2SHARE.ClickingtheUploadlinkinthemaininterfaceopensthefirstofathreestageprocessrequiredtouploaddata.Thestepsareasfollows:
1. Enterthetitleforthedataobjectthatisbeinguploaded,thenselecttheCommunitythatthedatabelongsto.InthecaseofCompBioMed,usersshouldselecttheCompBioMedoption(seeFigure8).ClickonCreateDraftRecord.
2. Nexttheusercaneitherdraganddropfilesfromtheirlocalmachinetothewebinterface,orimportfilesstoredintheirB2DROPaccount.
3. Finallytheuserhastofillinthebasicmetadatafields.Thesefieldsdependonthe
communitychosen.InthecaseoftheCompBioMedcommunity,basicinformationsuchastitle,description(andtype)andwhetherthedataisopenaccessaremandatoryfields,whileotherinformationsuchasthecreator,licence,URL,embargodate,keywordsandstudyIDareoptional.Hoveringthemouse-pointeroverthetextfieldwillshowadescriptionofthepurposeofthefield.Thelicencecanalsobeselectedthroughabuilt-inwizard.SeeFigure9fordetails.
D6.2 – Deployment of Project Informatics Platform
PU Page 13 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
Figure8. Thefirststageofthedatauploadprocess–enteringatitleandselectinga
community
Figure9. TheCompBioMedcommunitymetadatacaptureforminB2SHARE
TickingSubmitdraftforpublicationwillensurethattheuploadeddataisassignedapersistentidentifier(PID).Thiscanthenbeusedtorefertothedatainfuture.
D6.2 – Deployment of Project Informatics Platform
PU Page 14 Version 1.2
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement No 675451“
7.4 RESTAPIThe B2HARE HTTP REST API can be used to interact with B2SHARE via external services orapplications,forexampleforintegratingwithotherwebsites(researchcommunityportals)orforuploadingordownloading largedatasets thatarenoteasilyhandledviaawebbrowser.ThisAPIcanalsobeusedformetadataharvesting.This is particularly useful in the context of CompBioMed workflows, since it means thatworkflowsareautomaticallyabletoingestdataintheCompBioMedB2SHAREserviceastheyarecreated,thusalleviatingtheburdenontheuser.Only authenticated users can use the API. Each HTTP request to the server must pass anaccess_tokenparameterthat identifiestheuser.Theaccess_tokenisanopaquestringwhichcan be created in the user profile section of the B2SHARE web user interface. B2SHARE’saccesstokensfollowtheOAuth2.0standard,andallowAPIcallstobemadeonauser’sbehalf.Togetanaccesstoken,ausermustlogintotheB2SHAREwebinterfaceandclickonhisorherusername.OntheUserProfilepage,gototheAPITokenssection,enteratokenidentificationname (e.g.my-workflow) and clickNewToken. Thiswill create an access_token,which canthenbeusedwhenmakingAPIcalls.
8 ConclusionsThedeploymentofB2SHAREmeetsaspecificrequirementoftheCompBioMedprojecttobeabletoarchiveandshareresearchdata.Thebenefitsaretwofold:theserviceiseffectivelyrunbyCompBioMed,onCompBioMedmanagedresources.However,itintegrateswiththewiderEUDATinfrastructureandcanleverageEUDATservicessuchasB2HANDLEandB2FIND,greatlyimprovingthesustainabilityeffortsoftheprojecttopublishandpreserveitsdataoutputs.WeareintheprocessofdevelopingacommunitymetadatastandardtodescribethedataproducebythediverseresearchstrandsofCompBioMed.WealsoplantoengageinfurtherworktoautomatetheingestionofdataintoB2SHAREdirectlyfromprojectworkflows,viatheB2SHAREAPI.