introduction to r short course fall 2016

122
Welcome to an R intro! 1. Log in 2. Go to github.com/sjfox/2016_fall_intro_r, and download the materials 3. Open up the 2016_fall_intro_r.Rproj in RStudio

Upload: spencer-fox

Post on 10-Apr-2017

145 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Welcome to an R intro!1. Log in 2. Go to github.com/sjfox/2016_fall_intro_r,

and download the materials3. Open up the 2016_fall_intro_r.Rproj in RStudio

Introduction to RSpencer Fox

20 October 2016

[email protected]

@foxandtheflu

Why program?

Why program?• Simulation

Why program?• Simulation

• Automation

Why program?• Simulation

• Automation

• Reproducibility

Why use R?

Why use R?• Free

Why use R?• Free

• Powerful Statistics

Why use R?• Free

• Powerful Statistics

• Packages!

Why use R?• Free

• Powerful Statistics

• Packages!

• Increasingly popular

Why use R?• Free

• Powerful Statistics

• Packages!

• Increasingly popular

• Visualization

Always start with your end goal in mind

Always start with your end goal in mind

fivethirtyeight

Example R “Pipeline”

Example R “Pipeline”

1. Generate data

Example R “Pipeline”

1. Generate data

2. Analyze data

Example R “Pipeline”

1. Generate data

2. Analyze data

3. Show analysis

Example R “Pipeline”

1. Generate data

2. Analyze data

3. Show analysis

in R

Data Get data into R Analyze/calcu-late data

Generate beautiful figures

Share your results

Data analysis in the “tidyverse”

Slide created by Sean Leonard

Data Get data into R Analyze/calcu-late data

Generate beautiful figures

Share your results

Data analysis in the “tidyverse”

Slide created by Sean Leonard

Data Get data into R Analyze/calcu-late data

Generate beautiful figures

Share your results

Data analysis in the “tidyverse”

Slide created by Sean Leonard

Data Get data into R Analyze/calcu-late data

Generate beautiful figures

Share your results

Data analysis in the “tidyverse”

Slide created by Sean Leonard

Data Get data into R Analyze/calcu-late data

Generate beautiful figures

Share your results

Data analysis in the “tidyverse”

Slide created by Sean Leonard

Using R (RStudio)

Using R (RStudio)

Console

Using R (RStudio)

Editor

Console

Using R (RStudio)

EnvironmentEditor

Console

Using R (RStudio)

EnvironmentEditor

ConsoleMisc.

1st Programming Exercise1. open up the 2016_fall_intro_r.Rproj2. Navigate to the code folder and open up

r_intro.Rmd3. Start playing with code

1. Do: Ask Questions, run code, change things and see what happens

How to run code:1. Move cursor to a linecoding block2. Highlight line(s) of code 3. Type, ctrl+enter (windows) or cmd+enter (mac)4. See code running in console5. View output/figures

Now you can run code in R, so just need ingredients for your recipe

Now you can run code in R, so just need ingredients for your recipe

Vectors

Now you can run code in R, so just need ingredients for your recipe

Vectors

Data frames

Now you can run code in R, so just need ingredients for your recipe

Vectors

Data frames

Functions

Now you can run code in R, so just need ingredients for your recipe

Vectors

Data frames

FunctionsStop me if you see anything on this screen that doesn’t

make sense!

numeric character logical

R data structure flowchart

factor

numeric character logical

R data structure flowchart

5 “tupac” TRUEe.g.factorcontrol (1)

treatment (2)

numeric character logical

vector

R data structure flowchart

5 “tupac” TRUEe.g.factorcontrol (1)

treatment (2)

numeric character logical

vector

data frame

R data structure flowchart

5 “tupac” TRUEe.g.

tibble or

factorcontrol (1)

treatment (2)

numeric character logical

vector

data frame

R data structure flowchart

5 “tupac” TRUEe.g.

tibble or

factorcontrol (1)

treatment (2)

Everything in R is a function

Everything in R is a function

Function form:fxn(arg1, arg2, …)

Everything in R is a function

Function form:fxn(arg1, arg2, …)

> sum(5, 10, 15) [1] 30

Everything in R is a function

Function form:fxn(arg1, arg2, …)

5 + 10 equivalent to `+`(5,10)

> sum(5, 10, 15) [1] 30

R data structures

R data structures

R data structures

Data Get data into R Analyze/calcu-late data

Generate beautiful figures

Share your results

dplyr provides functions for manipulating and analyzing data frames

dplyr provides functions for manipulating and analyzing data frames

pipes (magrittr): %>%

dplyr provides functions for manipulating and analyzing data frames

pipes (magrittr): %>%

dplyr provides functions for manipulating and analyzing data frames

pipes (magrittr): %>%

equivalent to:

dplyr provides functions for manipulating and analyzing data frames

pipes (magrittr): %>%

equivalent to:

pipes (magrittr): %>%

filter(): Subset the rows in the df

filter(): Subset the rows in the df

df %>% filter(expression)

filter(): Subset the rows in the df

df %>% filter(expression)

filter(): Subset the rows in the df

df %>% filter(expression)

Expression Comparison between left and right side

== Equality

!= Inequality

< Less than

> Greater than

<= Less than or equal to

>= greater than or equal to

select(): Select columns in df

select(): Select columns in df

df %>% select(columns)

select(): Select columns in df

df %>% select(columns)

select(): Select columns in df

df %>% select(columns)

select(): Select columns in df

df %>% select(columns)

select syntax Description

select(col1:colx) All columns between col1 and colx

select(1:x) Columns 1 through x

select(col1, col2) All columns listed

select(-col1) All columns except col1

select(col1:col10, -col3) All columns between col1 and col10 except for col3

%>% allow stringing functions together

%>% allow stringing functions together

%>% allow stringing functions together

2nd Programming Exercise

mutate(): add a new column to df

mutate(): add a new column to df

df %>% mutate(new_col_name = expression)

mutate(): add a new column to df

df %>% mutate(new_col_name = expression)

mutate(): add a new column to df

df %>% mutate(new_col_name = expression)

Operation Description

+ Addition

- Subtraction

* Multiplication

/ Division

^ Exponentiate

sqrt() Take the square root

log() Take the logarithm (defaults to ln)

exp() Exponentiates (defaults to e^x)

group_by(): Make implicit groupingssummarise(): compute summary of groups

group_by(): Make implicit groupingssummarise(): compute summary of groups

group_by(): Make implicit groupingssummarise(): compute summary of groups

group_by(): Make implicit groupingssummarise(): compute summary of groups

group_by(): Make implicit groupingssummarise(): compute summary of groups

How would the code change if you wanted to find the average gdp for each country instead?

Summary Fxn Description

mean() Mean of values

sum() Sum values

median() Median

sd() Standard deviation

var() Variance

cor() Correlation

3rd Programming Exercise

Visualizing data

www.reddit.com/r/dataisbeautiful

Visualizing data

www.reddit.com/r/dataisbeautiful

ggplot2

ggplot2 visualizations

ggplot2 visualizations

The grammar of graphics (ggplot)

The grammar of graphics (ggplot)1. Data

• Raw data for plotting

The grammar of graphics (ggplot)1. Data

• Raw data for plotting2. Geometries

• The shape that will represent the data• point, line, bar, etc.

The grammar of graphics (ggplot)1. Data

• Raw data for plotting2. Geometries

• The shape that will represent the data• point, line, bar, etc.

3. Aesthetics• axis, color, size, shape, etc.

The grammar of graphics (ggplot)1. Data

• Raw data for plotting2. Geometries

• The shape that will represent the data• point, line, bar, etc.

3. Aesthetics• axis, color, size, shape, etc.

4. Scales• Mapping data to aesthetic (how to color geoms, data range to plot, etc)

A simple example

A simple example

A simple example

A simple examplenote that this uses “cowplot,” because I can’t stand ggplot2

default themes

ggplot2 default cowplot default

A simple example

A simple example

Data frame

A simple example

Data frameAesthetics

A simple example

Data frameAesthetics

Geometry

A simple example

Data frameAesthetics

GeometryLink with +

A simple example

Data frameAesthetics

GeometryLink with +

data column names

A second example

A second example

A second example

4th Programming Exercise

Principles of “tidy” data1. Every variable forms a column

2. Each observation forms a row

Principles of “tidy” data1. Every variable forms a column

2. Each observation forms a row

Patient Age Height Weight

Jack 30 72 180

Jill 28 64 115

Mary 27 62 112

Messy / Wide

Principles of “tidy” data1. Every variable forms a column

2. Each observation forms a row

Patient Age Height Weight

Jack 30 72 180

Jill 28 64 115

Mary 27 62 112

Patient Characteristic ValueJack Age 30

Jack Height 72

Jack Weight 180

Jill Age 28

Jill Height 64

Jill Weight 115

Mary Age 27

Mary Height 62

Mary Weight 112

Messy / WideTidy / Long

Principles of “tidy” data1. Every variable forms a column

2. Each observation forms a row

Patient Age Height Weight

Jack 30 72 180

Jill 28 64 115

Mary 27 62 112

Patient Characteristic ValueJack Age 30

Jack Height 72

Jack Weight 180

Jill Age 28

Jill Height 64

Jill Weight 115

Mary Age 27

Mary Height 62

Mary Weight 112

Messy / WideTidy / Long

Principles of “tidy” data1. Every variable forms a column

2. Each observation forms a row

Principles of “tidy” data1. Every variable forms a column

2. Each observation forms a row

gather(key=income, value=freq, -religion)

5th Programming Exercise

gather(key=income, value=freq, -religion)

Adding in more aesthetics

Adding in more aesthetics

Frequently used geoms + aesthetics

• geom_bar()• geom_line()• geom_point()• geom_histogram()• geom_ribbon()• geom_text()• geom_boxplot()

• color• size• fill• alpha• shape• linetype• group

http://docs.ggplot2.org/current/

6th Programming Exercise

6th Programming Exercise

R resources

• stack overflow (google)• Hadley Wickham’s website - http://hadley.nz/• http://www.r-bloggers.com/how-to-learn-r-2/• A Beginner's Guide to R (Use R!) by Alain Zuur,

Elena N. Ieno, and Erik Misters• The Art of R Programming: A Tour of Statistical

Software Design by Norman Matloff• ggplot2: Elegant Graphics for Data Analysis (Use R!)

by Hadley Wickham. — Maybe wait for the second edition (it’s slightly outdated)