r introduction
TRANSCRIPT
![Page 1: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/1.jpg)
R IntroWeek 1
Scott Chamberlain[modified from Haldre Rogers]
September 9, 2011
![Page 2: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/2.jpg)
Don’t just listen to me! Other Intros to R:
• http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf
• http://www.cyclismo.org/tutorial/R/• http://www.r-tutor.com/r-introduction• Quick R: http://www.statmethods.net/• http://www.bioconductor.org/help/course-materials/2011/CSAMA/Mond
ay/Morning%20Talks/R_intro.pdf
![Page 3: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/3.jpg)
R user frameworks• R from command line: OSX and PC
– Just type “R” into the command line – and have fun!
• R itself– http://www.r-project.org/
• RStudio – good choice– http://www.rstudio.org/
• RevolutionR [free academic version] – this is sort of the SAS-ised version of R– http://www.revolutionanalytics.com/downloads/free-academic.php– Uses proprietary .xdf file format that speeds up computation times
• Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors– https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources
• If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R– You can learn using these interfaces what code does what after pressing
buttons
![Page 4: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/4.jpg)
R user frameworks, cont.• R from Python
– RPy: http://rpy.sourceforge.net/
• C from R: – rcpp package:
• http://cran.r-project.org/web/packages/Rcpp/index.html • http://dirk.eddelbuettel.com/code/rcpp.html
– Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.• E.g.,
http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html
• & http://dirk.eddelbuettel.com/code/rcpp.examples.html
• Excel from R– XLConnect package:
http://cran.r-project.org/web/packages/XLConnect/index.html
• And more….see for yourself
![Page 5: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/5.jpg)
R Tips
• R can crash Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: – https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources
• When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples– Not doing this makes people not want to help you!
• R automatically overwrites files with the same file name!!!!– Make sure you want to overwrite a file before doing so
![Page 6: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/6.jpg)
Style
![Page 7: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/7.jpg)
Not this kind of style…
![Page 8: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/8.jpg)
This kind of style!!!
![Page 9: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/9.jpg)
Style
Style is important so YOU and OTHERS can read your code and actually use it
• Google style guide: – http://google-styleguide.googlecode.com/svn/tru
nk/google-r-style.html#generallayout• Henrik Bengtsson style guide: – http://www1.maths.lth.se/help/R/RCC/
• Hadley Wickham's style guide: – https://github.com/hadley/devtools/wiki/Style
![Page 10: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/10.jpg)
Preparing your data for R
• What makes clean data?– Correct spelling– Identical capitalization (e.g. Premna vs premna)
• If myvector <- c(3, 4, 5), calling Myvector does not work!
– No spaces between words (spaces turned into “.”)• Generally try to avoid, use underscores instead
– NA or blank (if using csv) for missing values• Find and replace to get rid of spaces after words• I generally keep an .xls and a .csv file so you can
always recreate work in R with the .csv file and still modify the .xls file
![Page 11: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/11.jpg)
Bringing data into R• Create csv file
– One worksheet only– No special formatting, filters, comments etc.– Copy only columns and rows with your data to the CSV, as R will read in columns without data
sometimes
• Name your variables well – self-explanatory, unique, lowercase, short-ish, one-word names
• In R, set the working directory– setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")– What is the working directory? getwd()– What is in the working directory? dir()
• Read in data– CSV files: iris.df <- read.csv("iris_df.csv", header=T)– Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it– From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")– From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)
• Write data– write.csv(dataframe, “dataframename.csv”), OR– save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
![Page 12: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/12.jpg)
R data structures• Scalar:
– Object with a single value, either numeric or character• Vector:
– Sequence of any values, including numeric, character, and NA• List:
– Arbitrary collections of variables – very useful R object• Character:
– Text, e.g., “this is some text”• Factor:
– Like character vectors, but only w/ values in predefined “levels”• Matrix:
– Only numeric values allowed• Dataframe:
– Each column can be of a different class• Immutable dataframe:
– special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations
• Function• Environment
![Page 13: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/13.jpg)
Exploring dataframes• str(dataframe) gives column formats and dimensions• head(dataframe) and tail() give first and last 6 rows• names(dataframe) gives column names• row.names(dataframe) gives row names• attributes(dataframe) gives column and row names and object class• summary(dataframe) gives a lot of good information
– Make sure variables are appropriate form• Character/string, Numeric, Factor, Integer, logical
– Make sure mins, maxs, means, etc. seem right– Make sure you don’t have typing errors so Premna and premna are two
separate factors• Use: unique(iris$species) to see what all unique values of a column
are• Or use: levels(spider$species) to see different levels
![Page 14: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/14.jpg)
To attach or not to attach…that is the question
• Some like to use ‘attach’ to make dataframe variables accessible by name within the R session
• Generally, ‘attach’ is frowned upon by R junkies. • Use dataframe$y, or data=dataframe, or
dataframe[,”y”], or dataframe[, 2]• To detach the object, use: detach()
I recommend: do not use attach, but do what you want
![Page 15: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/15.jpg)
R Packages
• 3,262 packages!!!!• Packages are extensions written by anyone for any purpose,
usually loaded by:– install.packages(”packagename”), then– require(packagename) or library()– Use ?functionname for help on any function in base R or in
R packages– In RStudio, just press tab when in parentheses after the
function name to see function options!!!• Explore packages at the CRAN site:
– http://cran.r-project.org/web/packages/
• Inside-R package reference: – http://www.inside-r.org/packages
![Page 16: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/16.jpg)
Data manipulation• Packages: plyr, data.table, doBY, sqldf,
reshape2, and more• Comparison of packages– Modified from code from Recipes,
scripts and Genomics blog: https://gist.github.com/878919
– data.table is by far the fastest!!! – BUT, ease of use and flexibility may be
plyr? See for yourself…• Also, see examples in the tutorial
code for reshape2 package for neat data manipulation tricks
![Page 17: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/17.jpg)
Visualizations
• A few different approaches:– Base graphics– Lattice graphics– Grid graphics– ggplot2 graphics– Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics
• An example:
![Page 18: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/18.jpg)
more on ggplot2 graphics
• There are classes taught by Hadley Wickham here at Rice if you want to learn more!– Data visualization (Stat645): http://had.co.nz/stat645/– Statistical computing (Stat405):
http://had.co.nz/stat405/• Hadley’s website is really helpful:
http://had.co.nz/ggplot2/ • The ggplot2 google groups site:
https://groups.google.com/forum/#!forum/ggplot2
![Page 19: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/19.jpg)
QUICK RSTUDIO RUN THROUGH
Keyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts
![Page 20: R Introduction](https://reader033.vdocuments.site/reader033/viewer/2022042515/5552c05bb4c905920f8b47d9/html5/thumbnails/20.jpg)
USE CASE HERE[see intro_usecase.R file]