introduction to reproducible research in r and r … › 2016 › 04 › ...2015/04/22  ·...

55
Introduction to Reproducible Research in R and R Studio. Susan Johnston April 1, 2016

Upload: others

Post on 08-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Introduction to Reproducible Researchin R and R Studio.

Susan Johnston

April 1, 2016

Page 2: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

What is Reproducible Research?

Reproducibility is the ability of an entire experiment orstudy to be reproduced, either by the researcher or bysomeone else working independently, [and] is one of themain principles of the scientific method.

Wikipedia

Page 3: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

In the lab:

Page 4: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Many of us are clicking, copying and pasting...

I Can you repeat all of this again. . .

I . . . and would you get the same results every time?

Page 5: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Worst Case Scenario

Page 6: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Scenarios that benefit from reproducibility

I New raw data becomes available.

I You return to the project after a period of time.

I Project gets handed to new PhD student/postdoc.

I Working collaboratively.

I A reviewer wants you to change a model parameter.

I When you find an error, but not sure where you went wrong.

Page 7: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Four rules for reproducibility.

1. Create a portable project.

2. Avoid manual data manipulation steps - use code!

3. Connect results to text.

4. Version control all custom scripts and documents.

Page 8: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Disclaimer

Many solutions to the same problem!

Page 9: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

The Environment: http://www.rstudio.com

Page 10: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Reproducible Research in

1. Creating a Portable Project (.Rproj)

2. Automate analyses - stop clicking and start typing.

3. Dynamic report writing with R Markdown and knitr

4. Version control using git

Page 11: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Reproducible Research in .

1. Creating a Portable Project (.Rproj)

2. Automate analyses - stop clicking and start typing.

3. Dynamic report writing with R Markdown and knitr

4. Version control using git

Page 12: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Structuring an R Project.http://nicercode.github.io/blog/2013-05-17-organising-my-project/

All data, scripts and output should be kept within the same projectdirectory.

Page 13: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Structuring an R Project.http://nicercode.github.io/blog/2013-05-17-organising-my-project/

All data, scripts and output should be kept within the same projectdirectory.

Page 14: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Structuring an R Project.http://nicercode.github.io/blog/2013-04-05-projects/

R/ Contains functions relevant to analysis.

data/ Contains raw data as read only.

doc/ Contains the paper.

figs/ Contains the figures.

output/ Contains analysis output(processed data, logs, etc. Treat as disposable).

.R Code for the analysis.

Page 15: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Structuring an R Project.http://robjhyndman.com/hyndsight/workflow-in-r/

I load.R - read in data from files

I clean.R - pre-processing and cleaning

I functions.R - define what you need for anlaysis

I do.R - do the analysis!

Page 16: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Bad habits can hinder portability.https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects

setwd("C:/Users/susjoh/Desktop/SalmoAnalysis")

setwd("C:/Users/Susan Johnston/Desktop/SalmoAnalysis")

setwd("C:/Users/sjohns10/Drive/SalmoAnalysis")

source("../../OvisAnalysis/GWASplotfunctions.R")

An analysis should be contained within a directory, and it shouldbe easy to move it or pass on to someone new.

Page 17: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Solution: using Projects.https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects

I Establishes a directory with associated .Rproj file.

I Automatically sets the working directory.

I Can save and source .Rprofile, .Rhistory, .Rdata files.

I Allows version control within R Studio.

Page 18: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Creating a Portable Project (.Rproj)

Page 19: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Creating a Portable Project (.Rproj)

Page 20: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Creating a Portable Project (.Rproj)

Page 21: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Creating a Portable Project (.Rproj)

Page 22: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Reproducible Research in .

1. Creating a Portable Project (.Rproj)

2. Automate analyses - stop clicking and start typing.

3. Dynamic report writing with R Markdown and knitr

4. Version control using git

Page 23: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

This is R. There is no “if”. Only “how”.

I CRAN, Bioconductor, github

I Reading in data and functions

read.table(), read.csv(), read.xlsx(), source()

I Reorganising data

reshape, plyr, dplyr

I Generate figures

plot(), library(ggplot2)

I Running external programmes with system()

Unix/Mac: system("plink -file OvGen --freq")

Windows: system("cmd", input = "plink -file OvGen --freq")

Page 24: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Reproducible Research in .

1. Creating a Portable Project (.Rproj)

2. Automate analyses - stop clicking and start typing.

3. Dynamic report writing with R Markdown and knitr

4. Version control using git

Page 25: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

The knitr package allows R code and document templates to becompiled into a single report containing text, results and figures.

Page 26: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Output script as Notebook

Page 27: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -
Page 28: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Write reports directly in R

Page 29: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Write reports directly in R

Page 30: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Creating an R Markdown Script (.Rmd).

Page 31: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Creating an R Markdown Script (.Rmd).

Page 32: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

A Quick Start Guidehttp://nicercode.github.io/guides/reports/

1. Type report text into .Rmd file

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

2. Enclose code to be evaluated in chunks

```{r}model1 <- lm(speed ~ dist, data = cars)

```

3. Evaluate code inline

The slope of the model is `r coefficients(model1)[2]`

The slope of the model is 0.16557

4. Compile report as .html, .pdf or .doc

Page 33: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

A Quick Start Guidehttp://nicercode.github.io/guides/reports/

NB. PDF and Word docs require additional software.http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop

Page 34: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

A Quick Start Guidehttp://nicercode.github.io/guides/reports/

http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop

Page 35: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Advanced Tips

I Control how chunks are reported and evaluated

```{r echo = F, warning = F, fig.width = 3}model1 <- lm(speed ~ dist, data = cars)

plot(model1)

```

I spin(): compile .R files using #’, #+ and #-

http://deanattali.com/2015/03/24/knitrs-best-hidden-gem-spin/

I LATEXdocuments, Presentations, Shiny, etc.

Page 36: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Reproducible Research in .

1. Creating a Portable Project (.Rproj)

2. Automate analyses - stop clicking and start typing.

3. Dynamic report writing with R Markdown and knitr

4. Version control using git

Page 37: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control can revert a document to a previousversion.

Page 38: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control can revert a document to a previousversion.

Page 39: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control can revert a document to a previousversion.

Page 40: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control can revert a document to a previousversion.

Page 41: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control Using git.https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN

Git can be installed on all platforms, and can be used to implementversion control within an R Studio Project.http://git-scm.com/downloads

Page 42: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Tools >Project Options allows setup of git version control.

Page 43: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Select git as a version control system

Page 44: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Select git as a version control system

Page 45: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

git information will appear in the top-right frame.

Page 46: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

git information will appear in the top-right frame.

Page 47: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Select files to version control, write a meaningful commit message>Commit

Page 48: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Select files to version control, write a meaningful commit message>Commit

Page 49: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

After modifying the file, repeat the process.

Page 50: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

After modifying the file, repeat the process.

Page 51: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Previous versions can be viewed and restored from the History tab.

Page 52: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Version Control in R Studio

Previous versions can be viewed and restored from the History tab.

Page 53: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Advanced Steps: Github

I Forking projects

I All scripts are backed up online

I Facilitates collaboration and working on different computers

Page 54: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Take home messages

I Manage projects reproducibly: The first researcher who willneed to reproduce the results is likely to be YOU.

I Time invested in learning to code pays off - do it.

I Supervisors should be patient and encourage students to code.

Page 55: Introduction to Reproducible Research in R and R … › 2016 › 04 › ...2015/04/22  · Reproducible Research in 1. Creating a Portable Project (.Rproj) 2. Automate analyses -

Online Resources

I RStudio: Idiot-proof guides and cheat sheetshttp://www.rstudio.com/

I Nice R Code: How-tos and advice on good coding practicehttp://nicercode.github.io/guide.html

I Ten Simple Rules for Reproducible Computational Researchhttp://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285

I Yihui Xie’s blog (knitr) http://yihui.name/en/categories/

I R Bloggers: http://www.r-bloggers.com/

I StackOverflow questions on R and knitrhttp://stackoverflow.com/questions/tagged/r+knitr