an open access publisher’s perspective on data publishing matthew cockerill managing director,...
TRANSCRIPT
An Open Access publisher’s perspective on data publishing
Matthew CockerillManaging Director, BioMed Central
Dryad-UK meetingHEFCE, London, 28 April 2010
About BioMed Central
Largest publisher of peer-reviewed open access research journals
Launched first open access journals in 2000 Part of Springer since October 2008 Now publishes 207 OA titles ~70,000 peer-reviewed OA articles published All research articles Creative Commons licensed Costs covered by 'article processing charge’ (APC)
Data is a first class citizen in BioMed Central publications
Electronic version of article is authoritative “Additional files” not “Supplementary material” Additional files can be central to the reported findings
of the paper Where possible, file is presented in a convenient
embedded form (movies, chemical structures, KML etc) while also making downloadable
“Mini-websites” provide a generic (too generic?) approach for presentation of complex data
Efficient online publication processes can facilitate dataset publication
Only a fraction of experimental data sets make it into the literature
Many more datasets have the potential to be useful, but do not warrant a traditional publication
For certain standard types of data, appropriate databases exist (e.g. nucleotide sequences)
But if such databases do not exist, or if further description of the experimental context is required?
Plans to extend reusability of data
BioMed Central aims to provide more explicit guidelines to facilitate data reuse both generic, and specific to particular disciplines and formats
Making authors original vector-based figure files available expands their reuse capability.
Similar possibility with data:Make any table of data from within articles conveniently downloadable in spreadsheet form
Scientific cloud computing
Bioinformaticists have been rapid adopters of cloud computing (as they were of the web)
Cloud computing can reduce the barriers to reproducibility
Publications can include or refer to necessary datasets and the computational tools that can be fired up to carry out/reproduce the analysis
Large datasets can live in cloud – take analysis to the data, rather than vice versa
Preservation
Publishers not best placed to run repositories for long term preservation of large datasets
Mirrors of publisher content not able to accept arbitrary amounts of additional data
Long term preservation presents a challenge with respect to continuity
Redundant international mirrors with independent governance and funding could help to reduce risk
Huge culture variation between disciplines
Value is maximized if everyone shares data But cultural norms vary heavily by discipline Prisoner’s dilemma – if no one else is sharing their
data, you have little to gain, and much to lose by sharing your own data
Funders are theoretically well placed to enforce norms for sharing data
But effectiveness of funder data sharing policies is questionable
Data sharing in medicine
Clinical trial data is one example of data which presents challenges re: privacy and consent
Perfect anonymization often impossible - certainly not without losing key aspects of data
Increasing collection of genomic data in trials accentuates this issue
Trial consent should include info re: limits of anonymizability
Full access to underlying data set could be made available for approved research purposes