data management and open access creating data files for...
TRANSCRIPT
![Page 1: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/1.jpg)
DataManagementandOpenAccessCreatingDataFilesforPublishedFigures
JoshStillerman,MartinGreenwald,MarkLondon,JasonThomasFebruary,2016
![Page 2: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/2.jpg)
PublishingDataforFigures
2
● TheDOErequirementisnotspecificaboutexactlywhichdataandmetadatamustbeincludedwithpublishedfigures.
–Weareinterpretingtherequirementtobe:
oTheactualvaluesplottedinthefigure
oMetadataaboutthosevalues
§Name,Description,Units
oMetadataabouthowthedataaredisplayedinthefigure
§ Labels,DisplayParameters
– Theyarearealsonotdictatinghowthedatashouldbestored.
oFileFormat/DataOrganization…
![Page 3: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/3.jpg)
PSFCStandardizedFormat
3
● Choosingastandardfileformathasseveraladvantages:
– Easieraccessforreadersofthepublication
– Easierverificationforlibrarians,curators,andsponsors
– Slowerobsolescence,andeasierconversionasstandardsevolve
– Standardgeneralpurpose toolsforbrowsingandviewingcontents.
● WehavechosenHDF5
– https://www.hdfgroup.org/HDF5/
![Page 4: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/4.jpg)
PSFCStandardizedSchema
4
● Usingastandardfileformatisgood,butnotgoodenough.– IfallofthedatafilesforfiguresinPSFCpublicationswereforexampleMSExcelo Thiswouldnotdictatetheorganizationoflabels,rowsandcolumnsinthosespreadsheets.
o Inordertointerpretoneofthemauserwouldhavetoopenthefileinteractivelyandattempttounderstandtheorganization.
– ThesameistrueforHDF5,sooWehavedefinedastandardHDF5fileorganizationtorepresentthedatainpublishedfigures.
o Easyaccessforallconsumers(sincetheyareallthesameinstructure)o EasytocreationfromtheprograminglanguagesinuseatthePSFC.§ IDL§ PYTHON§ MATLAB§ Thislistcanbeexpandedasneeded.
![Page 5: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/5.jpg)
PSFCStandardizedSchema(2)
5
● Onefileperfigure - thelibrarysystemwillnamethefilebasedonthepublication’sID– Rootlevelattributes:author, username,date,description, caption…– OneGroupper’trace’displayed.oGroup levelattributesforthistrace:
● OneGrouppersetofdatadisplayed– Group levelattributes:name,legendstring,plot-information– x_data – valuesfortheXaxisoUnits,label
– Y_data – valuesfortheYaxisoUnits,label
– Z_data – valuesfortheZaxisoUnits,label
![Page 6: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/6.jpg)
Creatingdatafiles
6
● Thetimetocreate(orupdate)thedatafilesiswhenthefiguresarebeingcreated– Atthattime,allofthedataisavailableinsomeprogramming language.– Itismuchmorelikelythefilewillmatchthefigure, ifitiscreatedatthattime.
● APIsaresetuptomimictheplottingAPIs.● Filescanbecreatedandconsumed inanyprogramming languageinterchangeably● ExampleinIDL● ExampleinPython● Otherlanguagestofollow
![Page 7: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/7.jpg)
IDL- Thefigure
7
![Page 8: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/8.jpg)
IDL
8
file='Fig_1'fig_description ='Besel FunctionsJ0,J1andJ2'fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe'date=systime(0)
;setupasimplecolortable(justforplotting)r=[000,255,255,000,000]g=[000,255,000,000,255]b=[000,255,000,255,000]tvlct,r,g,b
;startanewhdf5filehdf5_new,file=file,fig_description=fig_description,fig_source=fig_source,$
comment=comment,user_fullname=user_fullname,date=date
![Page 9: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/9.jpg)
IDL(2)x_units ='s’x_axis ='time(s)'x_name ='measuredwithastopwatch'x_type ='float'
y_units ='m'y_axis ='height(m)'y_name ='measuredwitharuler'y_type ='float'
legend='J0'
;compute and plotthe firstcurve(you'll dothis to create the plotfile)x=indgen(100)/5.y0=beselj(x,0)plot,x,y0,charsize=1.8,title=fig_description,xtitle=x_axis,ytitle=y_axis,color=1xyouts,/norm,.9,.85,legend,size=1.8
hdf5_add,x,y0,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics
9
![Page 10: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/10.jpg)
IDL(3)legend='J1'
y1=beselj(x,1)oplot,x,y1,color=2xyouts,/norm,.9,.8,legend,size=1.8,color=2
group_name =legendplot_graphics ='redline’
hdf5_add,x,y1,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics
10
![Page 11: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/11.jpg)
IDL(4)legend='J2'
;compute and plotthe third curvey2=beselj(x,2)oplot,x,y2,color=4xyouts,/norm,.9,.75,legend,size=1.8,color=4
group_name =legendplot_graphics ='greenline’
;adddatagroupforthistracetofilehdf5_add,x,y2,file=file,group_name=group_name,$
x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics
11
![Page 12: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/12.jpg)
TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /
root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)
J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')
x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)
y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)
J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')
x_values (Dataset)/root/J1/x_values len =(100,) 12
![Page 13: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/13.jpg)
Python- Thefigure
13
![Page 14: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/14.jpg)
Pythonfromscipy.special importjvromh5_dataimporth5_data
file_name ='Fig_4’fig_description ='Besel Functions J0, J1andJ2’fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe’
#Createthedatafile,withfilelevelmetadatahdf_file =h5_data("%s.hdf5"%(file_name,),
fig_description =fig_description,fig_source=fig_source,comment=comment,user_fullname =user_fullname)
14
![Page 15: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/15.jpg)
Python(2)#Drawthefirstcurvex=linspace(0, 20)y0=jv(0,x)plot(x,y0, '-b',label='J0')x_units='s’x_label='time(s)’y0_units='m’y0_label='height (m)’
#Addthefirstcurvetothefilehdf_file.add_dataset('J0',x,y0,
legend=None,plot_info='BlueLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y0_units,y_label=y0_label,y_datatype='float')
15
![Page 16: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/16.jpg)
Python(3)#Drawthesecondcurvey1=jv(1,x)plot(x, y1,'-g',label='J1')y1_units='m’y1_label='height (m)’
#Addthesecondcurvetothefilehdf_file.add_dataset('J1',x,y1,
legend=None,plot_info='GreenLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y1_units,y_label=y1_label,y_datatype='float')
16
![Page 17: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/17.jpg)
Python(4)#Drawthethirdcurvey2=jv(2,x)plot(x, y2,'-r',label='J2')y2_units='m’y2_label='height (m)’title(fig_description)xlabel(x_label)ylabel(y0_label)
#Addalegendlegend(loc='upper right')
#addthethirdcurvetothefilehdf_file.add_dataset('J2',x,y2,
legend=None,plot_info='RedLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y2_units,y_label=y2_label,y_datatype='float')
17
![Page 18: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/18.jpg)
TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /
root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)
J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')
x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)
y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)
J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')
x_values (Dataset)/root/J1/x_values len =(100,) 18
![Page 19: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,](https://reader034.vdocuments.site/reader034/viewer/2022052100/603a820aa44f070bd91ba8c9/html5/thumbnails/19.jpg)
19
END