reproducible research and the cloud

24
Reproducible Research and the Cloud Dr Kenji Takeda ([email protected]) Microsoft Research @azure4research

Upload: microsoft-azure-for-research

Post on 26-Jan-2015

115 views

Category:

Science


1 download

DESCRIPTION

Research results in peer-reviewed publications are reproducible, right? If only it was so clear cut. With high profile paper retractions and pushes for better data sharing by funders, publishers and the community, the spotlight is now focussing on the whole way research is conducted around the world. This talk from the Software Sustainability Institute's Collaborations Workshop 2014 describes how cloud computing, with Microsoft Azure, is helping researchers realize the goals of scientific reproducibility. Find out more at www.azure4research.com

TRANSCRIPT

Page 1: Reproducible Research and the Cloud

Reproducible

Research and

the Cloud

Dr Kenji Takeda ([email protected])

Microsoft Research

@azure4research

Page 2: Reproducible Research and the Cloud

Microsoft Research

Page 3: Reproducible Research and the Cloud

Scientific Discovery

Credit: ROYAL INSTITUTION OF GREAT BRITAIN / SCIENCE PHOTO LIBRARY

𝜌𝐷𝑣

𝐷𝑡= −𝛻𝑝 + 𝛻 ∙ 𝜯 + 𝒇

Page 4: Reproducible Research and the Cloud

The Research Lifecycle

Data

Acquisition & modelling

Collaboration and

visualisation

Analysis & data mining

Dissemination & sharing

Archiving and preserving

fourthparadigm.org

Page 5: Reproducible Research and the Cloud

Believe it or not: how much can we rely on

published data on potential drug targets?

“at least 50% of published studies, even those in top-tier academic journals,

can’t be repeated with the same conclusions by an industrial lab”

Osherovich, L. Hedging against academic risk. SciBX 14 Apr 2011 (doi:10.1038/scibx.2011.416).

Page 6: Reproducible Research and the Cloud

CLOUD COMPUTING

Page 7: Reproducible Research and the Cloud

Global

presence

Datacenter

Edge point

The Microsoft Cloud

Page 8: Reproducible Research and the Cloud

Cloud Computing

Page 9: Reproducible Research and the Cloud
Page 10: Reproducible Research and the Cloud

Choose from multiple runtimes and languages for your applications: Python, Java, PHP, .NET, Node.js

Run Linux on Windows Azure Virtual Machines (VHD)

Support multiple frameworks and popular open source applications with Windows Azure Web Sites

HDInsight Hadoop for Big Data analysis

Windows Azure

http://github.com/windowsazure

Page 11: Reproducible Research and the Cloud

REPRODUCIBLE RESEARCH

Page 12: Reproducible Research and the Cloud

htt

p:/

/ww

w.p

hd

com

ics.

com

/co

mic

s.p

hp

?f=

16

89

Page 13: Reproducible Research and the Cloud

• Computational experiments should be recomputable for all time

• Recomputation of recomputable experiments should be very easy

• It should be easier to make experiments recomputable than not to

• Tools and repositories can help recomputationbecome standard

• The only way to ensure recomputability is to provide virtual machines

• Runtime performance is a secondary issue

Ian Gent , Alexander Konovalov and Lars KotthoffSteven Crouch, Devasena Inupakutika

Page 14: Reproducible Research and the Cloud

Recomputation.org

Page 15: Reproducible Research and the Cloud

Zanadu.IO

Patrick Henaff and Claude Martini

Page 16: Reproducible Research and the Cloud

Zanadu.IO

Page 17: Reproducible Research and the Cloud

khmer-protocols:

• Effort to provide standard “cheap” assembly protocols for cloud machines.

• Entirely copy/paste; ~2-6 days from raw reads to assembly, annotations, and differential expression analysis. Est ~$150 per data set

• Open, versioned, forkable, citable.

Open Science

C. Titus Brown, @ctitusbrown

http://ged.cse.msu.edu/http://ivory.idyll.org/

Page 18: Reproducible Research and the Cloud

Explicitly a “protocol” – explicit steps, copy-paste, customizable, versioned; not black box.

No requirement for computational expertise or significant computational hardware.

~1-5 days to teach a bench biologist to use.

$100-150 of rental compute (“cloud computing”)…

…for $1000 data set.

Now adding in quality control and internal validation steps.

Some thoughts…

Reproducible computing

environment(Azure)

Publicly available

data(MMETSP)

Open and versioned protocol

Provenance

tracking and

registration

(Synapse?)

Page 19: Reproducible Research and the Cloud

Distribution Modeller

<compute + data>

Middle ground between:

Exploratory science

Procedural science

Black box that can be cracked open and modified

Page 21: Reproducible Research and the Cloud

• Reproducing my

own results

• Replicating other

people’s results

• Reproducing other

people’s results

Repeatability, Replicability,

Reproducibility, Reuse

“reviewers have no time and no resources to reproduce

data and to dig deeply into the presented work. “Life Sci VC: Academic bias & biotech failures: http:// lifescivc.com/2011/03/academic-bias-

biotech-failures/#0_ undefined,0_

Ph

oto

: lee

chan

tmca

rth

ur,

CC

-BY

Page 22: Reproducible Research and the Cloud

Windows Azure for Research

• Azure Research Awards

• Windows Azure for Research Training Courses

– Manchester, 3-4 April’14

• Webinars

• Technical resources & curriculum

• Research community engagements

www.azure4research.com

Page 24: Reproducible Research and the Cloud