unless otherwise indicated slides licensed under
DESCRIPTION
Doing Science in the Digital Age Software, Skills and Sociology http://dx.doi.org/10.6084/m9. figshare.957527 TGAC Science Symposia series , 11 March 2014 Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | [email protected]. Project funding from. - PowerPoint PPT PresentationTRANSCRIPT
Software Sustainability Institute
www.software.ac.ukDoing Science in
the Digital AgeSoftware, Skills and Sociology
http://dx.doi.org/10.6084/m9.figshare.957527
TGAC Science Symposia series, 11 March 2014Neil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]
Unless otherwise indicatedslides licensed under
Supported by Project funding from
Software Sustainability Institute
www.software.ac.uk
Four Paradigms of Research
Empirical
Theoretical
Computational
Data Exploration
Software Sustainability Institute
www.software.ac.uk
Water Swap Reaction Coordinate
A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energiesWoods CJ, Malaisree M, Hannongbua S, Mulholland AJJ. Chem. Phys. (2011) vol. 134, pp. 054114http://dx.doi.org/10.1063/1.3519057
Software Sustainability Institute
www.software.ac.uk
Pleiotropic loci
Selection at pleiotropic loci underlies disease co-occurrence in human populations. Navarro, Haley, Karosas et al. Submitted to Nature Genetics
Software Sustainability Institute
www.software.ac.uk
Behind every great piece of science…
#go through each SNP of interestfor(my $x = 0; $x < scalar @pos; $x++){ #and then each downstream SNP of interest for(my $y = $x+1; $y < scalar @pos; $y++) { #if SNPs within our chosen distance (500kb) and both present in the haplotypes file if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$pos[$x]})) && (exists($legArrayPos{$pos[$y]}))) { my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”;
#create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filename\n”; unless (-e $filename) { open(OUT, ">$filename");
#####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }
Software Sustainability Institute
www.software.ac.uk
The modern researcher…
• … worries about: Data management
and analysis Reproducible
research Scalable simulations Integration of
models and workflows
CollaborationPicture of Otto Stern courtesy of Emilio Segre Visual Archives
Where do they learn how to do this?
Software Sustainability Institute
www.software.ac.uk
Observation 1:Software is pervasive across research
Corollary: software is bleeding edge and long-tail Demanding users are coming from arts + humanities, economics, and social science as well as sciences
Software Sustainability Institute
www.software.ac.uk
Observation 2:A culture of re-use rather than re-invention is not widespread Corollary: we have wasted effort and increased siloing
Software Sustainability Institute
www.software.ac.uk
Observation 3:Many people are “embarrassed” about software
Corollary: something is broken in the way we regard, recognise and reward software
Software Sustainability Institute
www.software.ac.uk
The Research Cycle
Create
Test
Interpret
PublishRevise Paper
Data
Software
Research Outputs Research is a continuous cycle.
When we publish we are contributing to the body of knowledge.
Software Sustainability Institute
www.software.ac.uk
Research/Reuse/Reward Cycle
Index
Identify
CiteRewardCreate
Test
Interpret
PublishRevise
Research Reuse Reuse is also a cycle. We build our research on the work of others.
Reward mechanisms should encourage reuse.
Software Sustainability Institute
www.software.ac.uk
The current process
Startresearch
Writesoftware
Usesoftware
Produceresults
Publishresearch
paper
Releasedata
Releasesoftware
Which mentions software and data
This process is simple but does not reward production orreuse of good software and data.
It also has a long contribution cycle.
Software Sustainability Institute
www.software.ac.uk
Writesoftware
A better process?
Startresearch
Identifyexisting
software
Usesoftware
Produceresults
Publishresearch
paper
Adapt/extend
software
Releasedata
Releasesoftware
Publishsoftware
paper Publishdata
paper
Which references
software and data papers
Software and data papers are needed as proxies for rewarding reuse.
But it enables a shorter contribution cycle for data and software.
Software Sustainability Institute
www.software.ac.uk
What do we choose to identify:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?
Boundary
Software Sustainability Institute
www.software.ac.uk
Algorithm
Function
Prog
ram
Library / Suite / Package
…
Granularity
Software Sustainability Institute
www.software.ac.uk
Versioning
Personalv1
Personal v2
Personalv3
Personal v2a
Public v1
Personal v3a
Personal v2a
Public v2
Public v3
Why do we version?- To indicate a change- To allow sharing- To confer special status
Software Sustainability Institute
www.software.ac.uk
AuthorshipAuthorship• Which authors have had what impact on each version of the software?• Who had the largest contribution to the scientific results in a paper?
http://beyond-impact.org/?p=175
OGSA-DAI projects statistics from Ohloh
Software Sustainability Institute
www.software.ac.uk
Observation 4:This is all getting just a little confusing
Corollary: maybe we need to get on to firmer conceptual ground
Software Sustainability Institute
www.software.ac.uk
The Foundations of Digital Research
Software
Software
Software
Re-usable Re-producible
www.software.ac.uk/ software-evaluation-guide resources/guides software-carpentry training
www.rse.ac.uk
www.software.ac.uk/blog/ 2012-11-09-craftsperson-and-scholar
software.ac.uk/blog/2012-08-16-what-research-software-community-and-why-should-you-care
www.software.ac.uk/blog/2011-05-02-publish-or-be-damned-alternative-impact-manifesto-research-software
Prlić A, Procter JB (2012) Ten Simple Rules for the Open Development of Scientific Software PLoS Comput Biol 8(12): e1002802. doi:10.1371/journal.pcbi.1002802
Wilson G, et al. (2014) Best Practices for Scientific ComputingPLoS Biol 12(1): e1001745. doi:10.1371/journal.pbio.1001745
Software Sustainability Institute
www.software.ac.uk
Gap 1: Software Skills Training
Basic Advanced
ProgrammingFocussed
(Tools)
ResearchFocussed
(methods)
SoftwareCarpentry
Programming 101
SummerSchools
Advanced HPC Training
HPC Short CoursesDoctoral Training
MSc in HPC / scientific
computing
Programming 201
Who fills this gap?
Software Sustainability Institute
www.software.ac.uk
Gap 2: Lack of recognition and reward
• There is an anachronism in the way we conduct and recognise research? REF references software as an output but it is still not easy
to get recognition – peer review fails• Software careers
Researchers who use software Researcher-Developers Research Software Engineers Research Software Support Research Systems Providers
Software Sustainability Institute
www.software.ac.uk
Gap 3: Software Maturity and Management
Softw
are
prol
ifera
tion
Time
CustomisationInnovation Consolidation
Not all software should make it to the next stageManagement changes through time, requiring planning
Software Sustainability Institute
www.software.ac.uk
Standing on the shoulders of giants
• “If I have seen further it is by standing on the shoulders of giants” Isaac Newton
• As researchers we are honour-bound to share our knowledge so that all may benefit
Software Sustainability Institute
www.software.ac.uk
Observation 5:Most of the issues are not technical, they’re social
Corollary: we can do something to change them
Software Sustainability Institute
www.software.ac.uk
Career Paths in UKCareers outside academic sector
Non-universityResearch (industry,government etc.)
ProfessorPermanentResearch Staff
Early CareerResearch
PhD
stud
ents
Source: The Scientific Century, Royal Society, 2010 (revised to reflect first stage clarification from “What Do PhD’s Do?” study)
UK STEM graduate
career paths
Software Sustainability Institute
www.software.ac.uk
We are science
Hear us roar!
Picture by Tamako the Jaguar
Software Sustainability Institute
www.software.ac.uk
Shake up the system
• “Swim or drown” is not an efficient learning method
• “Publish or perish” is not an effective reward mechanism
• “Becoming a Professor” is not a scalable career path
• “I’ll just have to do it myself” is not a modern way of doing science
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software
Better software
Better research
Supported by EPSRC Grant EP/H043160/1
Software Sustainability Institute
www.software.ac.uk
Campaigning for careers
www.rse.ac.uk
http://www.rse.ac.uk/
Software Sustainability Institute
www.software.ac.uk
Nurturing a training community
• Bringing together 39+ organisations with interest in e-Infrastructure training
• Raising issues and enablers with RCUK, BIS
software.ac.uk/policy
Software Sustainability Institute
www.software.ac.uk
SSI Fellows 2014
• 2014: 16 fellows
• 2013: 15 fellows
• 2012: 10 fellows
• Range of subjects, career stages
software.ac.uk/fellows
Software Sustainability Institute
www.software.ac.uk
Welcome to the CW14The Role of Software in Reproducible Research
6th Collaborations Workshop, Oxford26-28th March 2014
Organised by the Software Sustainability InstituteSponsored by Microsoft Research and Github
#CollabW14software.ac.uk/cw14
Software Sustainability Institute
www.software.ac.uk
Publicise your softwarehttp://openresearchsoftware.metajnl.com
http://dx.doi.org/10.6084/m9.figshare.942289
Software Sustainability Institute
www.software.ac.uk
What you can do now
• Read the Best Practices for Scientific Computing http://dx.doi.org/10.1371/journal.pbio.1001745
• Release your code and publish it in a journal http://bit.ly/softwarejournals
• Learn new software skills and pass them on to others http://www.software-carpentry.org/
• Ask for software and data if you’re reviewing a paper
• Forge a career in research, and change it for those coming behind you
• The DOI for this presentation: 10.6084/m9.figshare.957257• The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and
Southampton. Supported by EPSRC Grant EP/H043160/1.