introduction to r short course fall 2016
TRANSCRIPT
Welcome to an R intro!1. Log in 2. Go to github.com/sjfox/2016_fall_intro_r,
and download the materials3. Open up the 2016_fall_intro_r.Rproj in RStudio
Data Get data into R Analyze/calcu-late data
Generate beautiful figures
Share your results
Data analysis in the “tidyverse”
Slide created by Sean Leonard
Data Get data into R Analyze/calcu-late data
Generate beautiful figures
Share your results
Data analysis in the “tidyverse”
Slide created by Sean Leonard
Data Get data into R Analyze/calcu-late data
Generate beautiful figures
Share your results
Data analysis in the “tidyverse”
Slide created by Sean Leonard
Data Get data into R Analyze/calcu-late data
Generate beautiful figures
Share your results
Data analysis in the “tidyverse”
Slide created by Sean Leonard
Data Get data into R Analyze/calcu-late data
Generate beautiful figures
Share your results
Data analysis in the “tidyverse”
Slide created by Sean Leonard
1st Programming Exercise1. open up the 2016_fall_intro_r.Rproj2. Navigate to the code folder and open up
r_intro.Rmd3. Start playing with code
1. Do: Ask Questions, run code, change things and see what happens
How to run code:1. Move cursor to a linecoding block2. Highlight line(s) of code 3. Type, ctrl+enter (windows) or cmd+enter (mac)4. See code running in console5. View output/figures
Now you can run code in R, so just need ingredients for your recipe
Vectors
Data frames
FunctionsStop me if you see anything on this screen that doesn’t
make sense!
numeric character logical
R data structure flowchart
5 “tupac” TRUEe.g.factorcontrol (1)
treatment (2)
numeric character logical
vector
R data structure flowchart
5 “tupac” TRUEe.g.factorcontrol (1)
treatment (2)
numeric character logical
vector
data frame
R data structure flowchart
5 “tupac” TRUEe.g.
tibble or
factorcontrol (1)
treatment (2)
numeric character logical
vector
data frame
R data structure flowchart
5 “tupac” TRUEe.g.
tibble or
factorcontrol (1)
treatment (2)
Everything in R is a function
Function form:fxn(arg1, arg2, …)
5 + 10 equivalent to `+`(5,10)
> sum(5, 10, 15) [1] 30
dplyr provides functions for manipulating and analyzing data frames
pipes (magrittr): %>%
equivalent to:
dplyr provides functions for manipulating and analyzing data frames
pipes (magrittr): %>%
equivalent to:
Expression Comparison between left and right side
== Equality
!= Inequality
< Less than
> Greater than
<= Less than or equal to
>= greater than or equal to
select syntax Description
select(col1:colx) All columns between col1 and colx
select(1:x) Columns 1 through x
select(col1, col2) All columns listed
select(-col1) All columns except col1
select(col1:col10, -col3) All columns between col1 and col10 except for col3
Operation Description
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponentiate
sqrt() Take the square root
log() Take the logarithm (defaults to ln)
exp() Exponentiates (defaults to e^x)
group_by(): Make implicit groupingssummarise(): compute summary of groups
How would the code change if you wanted to find the average gdp for each country instead?
Summary Fxn Description
mean() Mean of values
sum() Sum values
median() Median
sd() Standard deviation
var() Variance
cor() Correlation
The grammar of graphics (ggplot)1. Data
• Raw data for plotting2. Geometries
• The shape that will represent the data• point, line, bar, etc.
The grammar of graphics (ggplot)1. Data
• Raw data for plotting2. Geometries
• The shape that will represent the data• point, line, bar, etc.
3. Aesthetics• axis, color, size, shape, etc.
The grammar of graphics (ggplot)1. Data
• Raw data for plotting2. Geometries
• The shape that will represent the data• point, line, bar, etc.
3. Aesthetics• axis, color, size, shape, etc.
4. Scales• Mapping data to aesthetic (how to color geoms, data range to plot, etc)
A simple examplenote that this uses “cowplot,” because I can’t stand ggplot2
default themes
ggplot2 default cowplot default
Principles of “tidy” data1. Every variable forms a column
2. Each observation forms a row
Patient Age Height Weight
Jack 30 72 180
Jill 28 64 115
Mary 27 62 112
Messy / Wide
Principles of “tidy” data1. Every variable forms a column
2. Each observation forms a row
Patient Age Height Weight
Jack 30 72 180
Jill 28 64 115
Mary 27 62 112
Patient Characteristic ValueJack Age 30
Jack Height 72
Jack Weight 180
Jill Age 28
Jill Height 64
Jill Weight 115
Mary Age 27
Mary Height 62
Mary Weight 112
Messy / WideTidy / Long
Principles of “tidy” data1. Every variable forms a column
2. Each observation forms a row
Patient Age Height Weight
Jack 30 72 180
Jill 28 64 115
Mary 27 62 112
Patient Characteristic ValueJack Age 30
Jack Height 72
Jack Weight 180
Jill Age 28
Jill Height 64
Jill Weight 115
Mary Age 27
Mary Height 62
Mary Weight 112
Messy / WideTidy / Long
Frequently used geoms + aesthetics
• geom_bar()• geom_line()• geom_point()• geom_histogram()• geom_ribbon()• geom_text()• geom_boxplot()
• color• size• fill• alpha• shape• linetype• group
http://docs.ggplot2.org/current/
R resources
• stack overflow (google)• Hadley Wickham’s website - http://hadley.nz/• http://www.r-bloggers.com/how-to-learn-r-2/• A Beginner's Guide to R (Use R!) by Alain Zuur,
Elena N. Ieno, and Erik Misters• The Art of R Programming: A Tour of Statistical
Software Design by Norman Matloff• ggplot2: Elegant Graphics for Data Analysis (Use R!)
by Hadley Wickham. — Maybe wait for the second edition (it’s slightly outdated)