december 9, 2015 niso webinar: two-part webinar: emerging resource types - part 1 large data sets
Post on 11-Feb-2017
817 Views
Preview:
TRANSCRIPT
Data Publishing Workflows: Models
RDA-WDS PublishingDataWorkflowsWorkingGroupNISO:BigData,December2015
International,opengroupdealingwiththechallengesposedbyallsizes&typesofresearchdatahttps://rd-alliance.org/
http://www.icsu-wds.org/
Internationalgrouppromotinglong-termstewardshipof,anduniversalandequitableaccessto,quality-assuredscientificdataanddataservices,products,andinformation
Publishingdata
• Whatisdatapublishing?• Modelsindatapublishing• Recommendations• Challengeshttp://bit.ly/1TvGe9v
DataPublishing“Researchdatapublishingisthereleaseofresearchdata,associatedmetadata,accompanyingdocumentation,andsoftwarecode (incaseswheretherawdatahavebeenprocessedormanipulated)forre-useandanalysisinsuchamannerthattheycanbediscoveredontheWebandreferred toinaunique andpersistentway.Datapublishingoccursviadedicateddatarepositoriesand/or(data)journalswhichensurethatthepublishedresearchobjectsarewelldocumented,curated,archivedforthelongterm,interoperable,citable,qualityassuredanddiscoverable– allaspectsofdatapublishingthatareimportantforfuturereuseofdatabythirdpartyend-users.”
Austin, Claire C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Zenodo. 10.5281/zenodo.34542
DataPublishingWorkflows“…areactivitiesandprocessesthatleadtothepublicationofresearchdata,associatedmetadataandaccompanyingdocumentationandsoftwarecodeontheWeb.Incontrasttointerimorfinalpublishedproducts,workflowsarethemeanstocurate,document,andreview,andthusensureandenhance thevalueofthepublishedproduct…”
Austin, Claire C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Zenodo. 10.5281/zenodo.34542
SubjectsofReviewGuidelines fordatapublication,e.g.,
• ENVRIreferencemodel• PREPARDE
Datajournals,e.g.,• ScientificData• F1000
Repositories,e.g.,• Domain
– NationalSnow&IceDataCenter(NSIDC)– ICPSR(SocialSciences)
• General– Dryad– Arkivum+Figshare
• Institutional– StanfordDigitalRepository– DataRepositoryfor theUniversityofMinnesota (DRUM)
ElementsofAnalysis• Discipline
• Functionofworkflow
• Theassignment ofpersistentidentifiers (PIDs)todatasets
• ThePIDtypeused-- e.g.,DOI,ARK,etc.
• Peerreviewofdata(e.g.,byresearcherandbyeditorialreview)
• Curatorialreviewofmetadata(e.g.,byinstitutional orsubjectrepository)
• Technicalreviewandchecks(e.g.,fordataintegrityatrepository/datacentre oningest)
• Discoverability:Wasthereindexingofthedata,andifso,where?
• Formatscovered
• Persons/Roles involved,e.g.,editor,publisher,datarepositorymanager,etc.
• Linkstoadditionaldataproducts(datapaper;review;otherjournalarticles)or“stand-alone”product
• Linkstogrants,usageofauthorPIDs
• Whetherdatacitationwasfacilitated
• Whetherthedatalifecyclewasreferredto
• Standardscompliance
PublicationworkflowsTraditionalarticlepublication
Reproducibleresearchpublication
https://zenodo.org/record/34542#.VmVJqMrWlqc
Recommendations•Startsmallandbuildopensource/shareablecomponentsonebyoneinamodularwaywithagoodunderstanding ofhoweachbuilding blockfitsintotheoverallworkflowandwhatthefinalobjective is.
•Followstandardswheneveravailabletofacilitateinteroperability andtopermitextensionsbasedontheworkofothersusing thesamestandards.
• Implementandadheretostandardsfordatacitation,including theuseofpersistent identifiers (PIDs).LinkagesbetweendataandpublicationscanbeautomaticallyharvestedifDOIsfordataareusedroutinely inpapers.TheuseofresearcherPIDssuchasORCIDcanalsoestablishconnectionsbetweendataandpapersorotherresearchentitiessuchassoftware.TheuseofPIDscanalsoenablelinkedopendatafunctionality.
•Document roles,workflowsandservices.
Challenges
● Bi-directionallinking.
● Softwaremanagement.
● Versioncontrol/dynamicdata
● Sharing restricted-usedata.
● Roleclarity.
● Businessmodels.
● Datacitationsupport.
● Metrics.● Incentives.
Challenges
● Bi-directionallinking.
● Softwaremanagement.
● Versioncontrol/dynamicdata
● Sharing restricted-usedata.
● Roleclarity.
● Businessmodels.
● Datacitationsupport.
● Metrics.● Incentives.
BIGDataChallenges
•Dynamicdatacitation:https://rd-alliance.org/group/data-citation-wg.html
•Whodoeswhat?
–Researchers–Managers–Curators
•Howisthisfundable/sustainable?
•Whataboutmanycontributorstomassivedatasets?
Version control & Dynamic data
Role clarity & Business models
Data citation support
Wherewe’regoing
• Howdoestheintenttomakeresearchdatapublicinformtheresearchworkflow?
• Canweextenddatapublicationtocovertheresearchworkflowbetter/atall?
–Whodoesthat?Where?How?
–Whatarethechallenges?
Intenttopublishresearchdatainformingtheresearchworkflow
Traditionalresearchworkflows• Searchingliterature(knowanygoodreferences?)
• LookingfordiversedomainexamplesResearchworkflowsintegratingdatapublication
• Canvassingcommunity:diversedomainsdesired
• http://bit.ly/1N48NHf
http://projects.iq.harvard.edu/seamlessastronomy/home
Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Science 2.0 Repositories: Time for a Change in Scholarly Communication, DOI: 10.1045/january2015-assante http://nemis.isti.cnr.it/groups/infrascience
Whatwe’reasking:Howdoestheintenttopublishresearchdatainformtheresearchworkflow?Describetheresearchworkflow&howitintegratespracticesthatenabledatapublication:
1)Roles- whoisinvolvedinthestage2)Inputs- outputsfrompreviousstages3)Actions- steps/activities,bothoptionalandrequired4)Outputs- productsthatbecomeinputstonextstages5)Tools- bothcurrentanddesired,asrelevant
Describetheresultsoftheworkflow:1)Achieved2)Yettobeachieved&whatisneeded
http://bit.ly/1N48NHf
Extendingdatapublicationtocovertheresearchworkflow
• Currentpractices• Currenttools• Nascentopportunities
–Whodoesthat?Where?How?
–Whatarethechallenges?
top related