computing for research i spring 2012
DESCRIPTION
Computing for Research I Spring 2012. Introduction to R January 10. Primary Instructor: Elizabeth Garrett-Mayer. Check out online resources. http://people.musc.edu/~ elg26/teaching/methods2.2010/R-intro.pdf http://www.ats.ucla.edu/stat/r/ - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/1.jpg)
Computing for Research ISpring 2012
Primary Instructor: Elizabeth Garrett-Mayer
Introduction to RJanuary 10
![Page 2: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/2.jpg)
Check out online resources
http://people.musc.edu/~elg26/teaching/methods2.2010/R-intro.pdf
http://www.ats.ucla.edu/stat/r/
http://www.statmethods.net/about/learningcurve.html
http://www.mayin.org/ajayshah/KB/R/index.html
http://processtrends.com/Learn_R_Toolkit.htm
![Page 3: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/3.jpg)
R. Kabacoff on learning R after SPSS and SAS (http://www.statmethods.net/about/learningcurve.html)
• Why R has A Steep Learning Curve • A long answer to a simple question... • I have been a hardcore SAS and SPSS programmer for more than 25 years, a Systat programmer for 15 years and a
Stata programmer for 2 years. But when I started learning R recently, I found it frustratingly difficult. Why? • I think that there are two reasons why R can be challenging to learn quickly. • First, while there are many introductory tutorials (covering data types, basic commands, the interface), none alone
are comprehensive. In part, this is because much of the advanced functionality of R comes from hundreds of user contributed packages. Hunting for what you want can be time consuming, and it can be hard to get a clear overview of what procedures are available.
• The second reason is more ephemeral. As users of statistical packages, we tend to run one proscribed procedure for each type of analysis. Think of PROC GLM in SAS. We can carefully set up the run with all the parameters and options that we need. When we run the procedure, the resulting output may be a hundred pages long. We then sift through this output pulling out what we need and discarding the rest.
• The paradigm in R is different. Rather than setting up a complete analysis at once, the process is highly interactive. You run a command (say fit a model), take the results and process it through another command (say a set of diagnostic plots), take those results and process it through another command (say cross-validation), etc. The cycle may include transforming the data, and looping back through the whole process again. You stop when you feel that you have fully analyzed the data. It may sound trite, but this reminds me of the paradigm shift from top-down procedural programming to object oriented programming we saw a few years ago. It is not an easy mental shift for many of us to make.
• In that in the end, however, I believe that you will feel much more intimately in touch with your data and in control of your work. And it's fun!
![Page 4: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/4.jpg)
Installing R
• http://cran.r-project.org/• Choose appropriate interface
– windows– Mac– Linux
• Follow install instructions
![Page 5: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/5.jpg)
R interface
• batching file: File -> open script
• run commands: Ctrl-R
• Save session: sink([filename])….sink()
• Quit session: q()
![Page 6: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/6.jpg)
General Syntax
• result <- function(object(s), options…)
• function(object(s), options…)
• Object-oriented programming
• Note that ‘result’ is an object
![Page 7: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/7.jpg)
First things first:
• help([function]) or ?function
• help.search(“linear model”) or ??”linear model”
• help.start()
![Page 8: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/8.jpg)
Choosing your default
• setwd(“[pathname for directory]”)• getwd()
• need “\\” instead of “\” when giving paths
• .Rdata
• .Rhistory
![Page 9: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/9.jpg)
Start with data
• read.table
• read.csv
• scan
• dget
![Page 10: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/10.jpg)
Extracting variables from data
• Use $: data$AGE
• note it is case-sensitive!
• attach([data]) and detach([data])
![Page 11: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/11.jpg)
Descriptive statistics
• summary
• mean, median
• var
• quantile
• range, max, min
![Page 12: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/12.jpg)
Missing values
• sometimes cause ‘error’ message
• na.rm=T
• na.option=na.omit
![Page 13: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/13.jpg)
Data Objects• data.frame, as.data.frame, is.data.frame
– names([data])– row.names([data])
• matrix, as.matrix, is.matrix– dimnames([data])
• factor, as.factor, is.factor– levels([factor])
• arrays• lists• functions• vectors• scalars
![Page 14: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/14.jpg)
Creating and manipulating• combine: c
• cbind: combine as columns• rbind: combine as rows
• list: make a list
• rep(x,n): repeat x n times
• seq(a,b,i): create a sequence between a and b in increments of i
• seq(a,b, length=k): create a sequence between a and b with length k with equally spaced increments
![Page 15: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/15.jpg)
ifelse• ifelse(condition, true, false)
– agelt50 <- ifelse(data$AGE<50,1,0)– for equality must use “==“– “or” is indicated by `|’
e.g., young.or.old <- ifelse(data$AGE<30 | data$AGE>65,1,0)
• cut(x, breaks)
– agegrp <- cut(data$AGE, breaks=c(0,50,60,130))– agegrp <- cut(data$AGE, breaks=c(0,50,60,130),
labels=c(0,1,2))– agegrp <- cut(data$AGE, breaks=c(0,50,60,130),
labels=F)
![Page 16: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/16.jpg)
Looking at objects
• dim
• length
• sort
• attributes
![Page 17: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/17.jpg)
Subsetting
• Use [ ]
• Vectors– data$AGE[data$REGION==1]– data$AGE[data$LOS<10]
• Matrices & Dataframes– data[data$AGE<50, ]– data[ , 2:5]– data[data$AGE<50, 2:5]
![Page 18: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/18.jpg)
Some math
• abs(x)
• sqrt(x)
• x^k
• log(x) (natural log, by default)
• choose(n,k)
![Page 19: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/19.jpg)
Matrix Manipulation
• Matrix multiplication: A%*%B
• transpose: t(X)
• diag(X)
![Page 20: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/20.jpg)
Table
• table(x,y)
• tabulate(x)
![Page 21: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/21.jpg)
Statistical Tests and CI’s
• t.test
• fisher.test and binom.exact
• wilcox.test
![Page 22: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/22.jpg)
Plots
• hist
• boxplot
• plot– pch, type, lwd– xlab, ylab– xlim, ylim– xaxt, yaxt
• axis
![Page 23: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/23.jpg)
Plot Layout
• par(mfrow=c(2,1))
• par(mfrow=c(1,1))
• par(mfcol=c(2,2))
• help(par)
![Page 24: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/24.jpg)
Probability Distributions
• Normal:– rnorm(N,m,s): generate random normal data– dnorm(x,m,s): density at x for normal with mean m, std
dev s– qnorm(p,m,s): quantile associated with cumulative
probability of p for normal with mean m, std dev s– pnorm(q,m,s): cumulative probability at quantile q for
normal with mean m, std dev s• Binomial
– rbinom– etc.
![Page 25: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/25.jpg)
Libraries
• Additional packages that can be loaded (next lecture)
• Example: epitools
• library
• library(help=[libname])
![Page 26: Computing for Research I Spring 2012](https://reader033.vdocuments.site/reader033/viewer/2022051214/56813c0d550346895da57ec9/html5/thumbnails/26.jpg)
Keeping things tidy
• ls() and objects()
• rm()
• rm(list=ls())