sciforge workshop@potsdam institute for climate impact reserach; nov 2014

16
Is your code worth a full citation? Dominik Reusser, RD2 Martin Hammitsch, GFZ-CeGIT 27. November 2014, PIK Modelling Strategy Seminar

Upload: dreusser

Post on 15-Jul-2015

143 views

Category:

Science


4 download

TRANSCRIPT

Page 1: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

Is your code worth a full citation?

Dominik Reusser, RD2

Martin Hammitsch, GFZ-CeGIT

27. November 2014, PIK Modelling Strategy Seminar

Page 2: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

2

Reproduzierbarkeit als wissenschaftliches Prinzip

Publikation

Daten Software

Page 3: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

3

DFG: „Primärdaten als Grundlagen für Veröffentlichungen sollen auf haltbaren und gesicherten Trägern in der Institution, wo sie entstanden sind, für zehn Jahre aufbewahrt werden.“

Page 4: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

4

For example, in December 2006, Geoffrey Chang from the Department of Molecular Biology at the Scripps Research Institute, California, US, was horrified when a hand-written program flipped two columns of data, inverting an electron-density map. As a result, a number of papers had to be retracted from the journal Science.

Page 5: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

The missing link

Findings, papers, data … and software?− Data already is professionally published, either with papers or self-contained− Not standard practice with the related software− Findings are not only based on raw data, they are also based on data obtained in analyses most likely supported

by software

Software is the link between the findings presented in papers and the data the findings are based on.

− Software used to gain findings play a crucial role in the scientific work− However, software is rarely seen publishable in terms of scientific publications− Researchers may not reproduce the findings without the software which is in conflict with the principle of

reproducibility in natural sciences

The provision of software lacks solutions serving researchers’ needs.− Software publications would fix the missing link between data and papers of findings− Software publications would foster their interplay

“Establish the missing link between papers and data publications.”

Page 6: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

6

• The code was not written for others to use.• Scientist and reviewers wanted a longer conversation with a

chance to go back and forth

Nature 2011

Page 7: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

Best practices

The treatment of source code is associated with additional work that is not covered in the primary research task.

− This includes code design, version control, documentation, and testing …− To safeguard traceability and reusability this scientific work has to be planned and supported− This includes the adoption of processes following the software development life cycle

Adoption of software engineering rules and best practices have to be recognized and accepted as part of the scientific performance.

− Most scientists have little incentive to improve code− They do not publish code either with their papers or self-contained− Software engineering habits are rarely practised by faculty and research facility staff, postdocs, doctoral and

graduate students and thus undergraduate students− Software engineering skills are not passed on to followers as for paper writing skill

It is often felt that the software or code produced is not publishable.− The quality of software and its source code has a decisive influence on the quality of research results

Establishing best practices from software engineering not only adopted but also adapted to serve scientific needs is crucial for the success of software publications

“Establish standard software engineering rules,

best practices and processes in science.”

Page 8: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

Scientific achievement

Disciplinary journals require that articles discuss scientific problems.− Software is often seen only as a contribution to the solution of a question or problem− Software is not perceived as an independent contribution to science− Authors of software must first find a question to motivate the publication in a desired journal

A direct release of software in kind of scientific publications is not possible.− Scientific achievements of software and its contributions to sciences are poorly perceived and hardly

measurable

The resulting gap in interdisciplinary communication regarding scientific software might be closed by software publications.

− It requires common understanding of how to handle scientific software with defined processes− It requires commonly accepted and adopted metrics− Thus software could be valued and assessed as a contribution to science

“Make software recognized as scientific achievement.”

Page 9: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

The paper must be accompanied by the code, or means of accessing the code, for the purpose of peer-review. If the code is normally distributed in a way which could compromise the anonymity of the referees, then the code must be made available to the editor. The referee/editor is not required to review the code in any way, but they may do so if they so wish.

All papers must include a section at the end of the paper entitled “Code availability”. In this section, instructions for obtaining the code (e.g. from a supplement, or from a website) should be included; alternatively, contact information should be given where the code can be obtained on request, or the reasons why the code is not available should be clearly stated.

We strongly encourage authors to upload any user manuals associated with the code.

For models where this is practicable, we strongly encourage referees to compile the code, and run test cases supplied by the authors where appropriate.

Unlike internal reports, GMD papers undergo a peer-review process, which establishes the criteria for full publication of model developments. Submitted code has not itself been peer-reviewed by GMD, but, as one of our new initiatives, we will require that referees and editors obtain the model code, and encourage them to execute test cases where practical.

example #1

Page 10: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

example #2

Both Figshare and Zenodo integrate with Github

Neither repository offers long-term storage of executable code(e.g. storing all software dependencies or virtual machines)

Page 11: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

example #3

Persistent identifiers for software are not (yet) common practice.

Page 12: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

example #4

Page 13: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

example #5

Page 14: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

example #6

Page 15: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

Open science

“Leverage open access and open science.” Scientific software development often implies that the software and code is not written

for others to use.− Code is kept and maintained on own computers and servers− If the code grows or groups work together code repositories and version control systems are set up− In many cases these systems are available for internal use, usually not reachable from the outside

Reuse mainly happens informally or anonymously, even in sciences.− Scientists use existing software and code from open source software repositories− Only few contribute their code back into the repositories

For cooperation and reuse of software, there is already a number of software platforms

− SourceForge and GitHub are used already by scientists− Platforms fulfill partly scientific needs to serve software and code as part of the scientific tradition− It is unclear, if these platforms can be augmented for scientific purposes or whether special repositories must be

created Subsequent users have to be able to run the code

− It requires the provision of sufficient documentation, sample data sets, tests and comments which in turn can be proven by adequate and qualified reviews

− This assumes that scientist learn to write and release code and software as they learn to write and publish papers

Page 16: SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

Publication and Citation of Scientific Software with Persistent Identifiers

Software development in general is not perceived as a scientific achievement – but:Software accounts for an increasingly prominent space in research and has become an indispensable commodity.

Software has become an integral part of science, yet software is not properly integrated into the scientific discourse.

Why?

Find and implement solutions serving researchers’ needs so thatsoftware development can be part of the academic tradition and is regarded as scientific achievement of its authors.

Recognize, create, and act upon opportunities for the development of concepts establishing defined processes.

Where is it going?

Establish the missing link between papers and data

publications.

Make software recognized as scientific achievement.

Leverage open access and open science.

Establish standard software engineering rules, best

practices and processes in science.