nick barnes at ukmo, 2012-02-28climatecode.org1 better science through software copyright climate...

23
Nick Barnes at UKMO, 2012-02-28 climatecode.org 1 Better Science Through Software Copyright Climate Code Foundation, license CC-BY

Upload: august-mitchell

Post on 13-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Nick Barnes at UKMO, 2012-02-28 climatecode.org 1

Better Science Through Software

Copyright Climate Code Foundation, license CC-BY

Nick Barnes at UKMO, 2012-02-28 climatecode.org 2

What is the CCF?• A non-profit founded in 2010;• Continuing projects started in 2008;• A few software consultants, currently unpaid part-time;• Advisory committee of a dozen experts;• A growing network of climate scientists and others;• Several projects and publications;

• and big plans.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 3

What is the problem?Scientists have to write code, but:•They aren’t well-trained;•They aren’t properly rewarded;•There is no incentive to publish it.

The public need to know about climate science, but:•The science isn’t accessible;•The practices aren’t always transparent;•They are lied to about ‘tricks’ and secrecy.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 4

Foundation goals

"to promote public understanding of climate science, • by increasing the visibility and clarity of the software used

in climate science, and by encouraging climate scientists to do the same;

• by encouraging good software development and management practices among climate scientists;

• by encouraging the publication of climate science software as open source.”

http://climatecode.org/goals/

Nick Barnes at UKMO, 2012-02-28 climatecode.org 5

Advisory Committee

Climate Scientists

•Kate Willett

•James Annan

•V. Balaji

•Stefan Brönnimann

•John Christy

•Reto Ruedy

•Peter Thorne

Other Scientists

•Steve Easterbrook

•Peter Murray-Rust

•Cameron Neylon

•Andrew Woolf

Non-scientists

•Paul Edwards

•Glyn Moody

Nick Barnes at UKMO, 2012-02-28 climatecode.org 6

Clear Climate Code• Project started in 2008.• Over-riding goal is clarity: code which interested members

of the public can download, run, read and understand.• Open-source, of course.• First target NASA GISTEMP:• 12 KLOC of Fortran (etc).• became 3678 lines of Python• (including 1500 of docstrings)• fixed minor bugs. • fosters new science:• one paper out now, more draft• ccc-gistemp.googlecode.com

Nick Barnes at UKMO, 2012-02-28 climatecode.org 7

Why clarity?

• Original motivation was to answer critics:• Not the real code;• Can’t be run;• Contains “obvious bugs”;• “divinci code written by the shortbus crew.”

• But also a key message of software engineering:

Your target audience is people, not compilers

• Those people are often yourselves.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 8

What is clarity?

def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances.

:Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records).

""" records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at UKMO, 2012-02-28 climatecode.org 9

Clear how?

def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances.

:Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records).

""" records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at UKMO, 2012-02-28 climatecode.org 10

Clear to whom?

def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances.

:Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records).

""" records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at UKMO, 2012-02-28 climatecode.org 11

Unclear how?

def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances.

:Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records).

""" records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at UKMO, 2012-02-28 climatecode.org 12

Unclear how?

for m in range(12): sum_new = 0.0 # Sum of data in new sum = 0.0 # Sum of data in average count = 0 # Number of years where both new and average are valid for a,n in itertools.izip(average[first_year*12+m: last_year*12: 12], new[first_year*12+m: last_year*12: 12]): if invalid(a) or invalid(n): continue count += 1 sum += a sum_new += n if count < min_overlap: continue bias = (sum-sum_new)/count

Nick Barnes at UKMO, 2012-02-28 climatecode.org 13

Clarity enables new science• By promoting “computational thinking” (Wing, NSF),

• Clear code raises new questions…• Airport-only trends?• Effect of US data?• Effect of restricting to long-record stations?• Use of land data for ocean cells?• Adding more data scraped from met sites?

• …and helps answer them…

• …for both original authors and others.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 14

Homogenization project• GHCN 3.0 dataset (Menne & Williams 2009);• Re-implemented by Dan Rothenberg (Cornell, now MIT);• Working with Menne and Williams at NCDC;• Algorithm improved, bugs fixed;• Revised dataset – GHCN-M 3.1.0 – see M&W tech note;• Funded by Google (Summer of Code 2011).• Presented at AMS New Orleans, 2012-01-23.

• Many extensions possible: Peter Thorne has a dream….

Nick Barnes at UKMO, 2012-02-28 climatecode.org 15

Common Climate Project• Web framework for visualizing climate datasets;• Late Holocene paleoclimatology:

Emile-Geay (USC), Smerdon & Anchukaitis (LDEO);• Open-source, open datasets;• Prototype online at commonclimate.net;• Implemented by Hannah Aizenman (grad student at CUNY);• Funded by Google (Summer of Code 2011).• Presented at AMS New Orleans.

• … development continues.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 16

Google Summer of Code• Google pays students to write code ($5000 for 3 months);• Any open-source project;• CCF acts as an “umbrella organization”.

• Our 2011 projects:• Hannah Aizenman: Common Climate Project;• Filipe Fernandes: Extensions to ccc-gistemp;• Daniel Rothenberg: Homogenization.(all presented at AMS New Orleans).

Nick Barnes at UKMO, 2012-02-28 climatecode.org 17

• Timetable 2012:

• Feb 27–Mar 9: brief window for orgs to apply;• Mar 16: orgs announced;• Mar 26–Apr 6: brief window for students to apply;• Apr 23: projects announced;• May 21–Aug 20: Coding!• Aug 27: final results;• Oct 20/21: mentor summit.

Google Summer of Code

Nick Barnes at UKMO, 2012-02-28 climatecode.org 18

Can you reproduce Fig 7a?• “Why?”

• Reproducibility;• New data;• Bug fixes;• Revised model;• Transparency.

• Why not?• Versioned code;• Versioned data;• Configuration Management.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 19

Open Science• Accelerating trend towards more openness in science.• Redefining publication:

• Open Access;• Open Data;• Open Knowledge;• Open Notebooks;• Data-driven intelligence;

• Workshops, conferences, summits;• There’s a war on: PRISM, RWA;• Royal Society policy study: Science as a Public Enterprise;• But no coherent message about open software in science.

• Michael Nielsen: Reinventing Discovery

Nick Barnes at UKMO, 2012-02-28 climatecode.org 20

Science Code ManifestoCode: All source code written specifically to process data

for a published paper must be available to the reviewers and readers of the paper.

Copyright: The copyright ownership and license of any released source code must be clearly stated.

Citation: Researchers who use or adapt science source code in their research must credit the code's creators in resulting publications.

Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition.

Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication.

Nick Barnes at UKMO, 2012-02-28 climatecode.org 21

Future Plans

• Changing policies:• Transparency;• Rewards for all research products.

• Training scientists:• Basic techniques (testing, version control, agile, etc);• Code publication and reuse.

• Providing resources:• White papers, blog posts;• Directories.

• Building networks, partnering with institutions;• Leading by example:

• ccc-gistemp;• ccf-homogenization;• etc….

Nick Barnes at UKMO, 2012-02-28 climatecode.org 22

Questions?

Nick Barnes at UKMO, 2012-02-28 climatecode.org 23

Funding

• I say "non-profit".  Approximately “non-revenue".• All accounts open.• Total revenue to date £7037.94 (+ GSoC students).• Total costs to date £5357.71 (as of 2012-01-31).• All work unpaid (not counting GSoC students).• Personal lost income to date probably £40K. • Funding model seeks £150K-£500K annually from

corporate or NGO sponsorship (plus some project money from academic collaborations).

• Too much? Not enough? Depends who you ask.• Open to suggestions!