introduction to r:joseph powell overall aims introduce programming concepts relevant to mx...
TRANSCRIPT
Introduction to R: Joseph Powell
Overall Aims
• Introduce programming concepts relevant to MX
• Demonstrate the strengths (and weaknesses) of R
Introduction to R: Joseph Powell
Books
• The R Book – Crawley (2007)
• Introductions to statistics using R– Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R. – Crawley M. (2005). Statistics: An Introduction using R. – Dalgaard P. (2002). Introductory Statistics with R. – Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach.
• Books on biological topics– Paradis E. (2006). Analysis of Phylogenetics and Evolution with R. – Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl. – Bolker B.M. (2008). Ecological Models and Data in R.
• Books on statistical topics– Aitkin M. et al. (2009). Statistical Modelling in R. – Faraway J. (2009). Linear Models with R. – Albert J. (2009). Bayesian Computation with R. – Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R. – Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R.
• Books on R specifics and R programming– Spector P. (2008). Data Manipulation with R. – Murrell P. (2006). R Graphics. – Chambers J. M. (2008). Software for Data Analysis: Programming with R.
Introduction to R: Joseph Powell
Websites
• Websites:– Cran R: http://www.r-project.org/– R cookbook: http://www.r-cookbook.com/– R graphics: http://addictedtor.free.fr/graphiques/– R wiki: http://wiki.r-project.org/– Mailing lists: http://www.r-project.org/mail.html– R seek: http://www.rseek.org/
• Websites on statistical topics– R genetics: http://rgenetics.org/trac/rgalaxy– Bioconductor: http://www.bioconductor.org/
Introduction to R: Joseph Powell
The console
• Load up R• Console window appears, with a command prompt• Everything in the R console can be partitioned into two
fundamental operations:– Input variables
> x <- 2
– Output variables > x
[1] 2
Introduction to R: Joseph Powell
Objects
• Names– Case sensitive, no spaces– Must begin with a letter but also can contain numbers and: . _– Try to give your objects meaningful names
> My_f4vourite.langua6e_evR <- “R”
• x, y and My_f4v… are objects that we have created > ls() # this will bring up a list of all our objects
> rm(y) # this deletes y (forever)
> rm(list=ls()) # this deletes everything (..forever)
Introduction to R: Joseph Powell
Workspace 1
• Everything shown in this list of objects comprises our 'workspace'
> ls()[1] "My_f4vourite.langua6e_evR" "x" "y“> save.image(file=“myworkspace.RData”)
> rm(list=ls()) > ls() character(0) > load(file = “myworkspace.RData”) > ls()
[1] "My_f4vourite.langua6e_evR" "x" "y“
• Objects are internal to R– Does not behave like a file structure on the computer– Can't be read or interpreted outside R (?)
Introduction to R: Joseph Powell
Workspace 2
• You can select which objects to save
> save(y, x, file = “two_objects.RData”)
• Different computer folders can be accessed
> dir() # shows current work directory
> setwd(“~/work_directory”) # sets R's focus to a different computer folder
Introduction to R: Joseph Powell
Built-in functions
• Native functions make R succinct
• Diverse range available from graphics to data manipulation to statistical algorithms etc.
• Highly optimised so use them if they are available instead of writing your own
• Function structure:
> function_name(<argument 1>, <argument 2>, …)
Introduction to R: Joseph Powell
Missing values
• NA is a “reserved” word in R
• It is a single element (length 1) that indicates a missing value
• A helpful alternative to coding missing values (e.g -99)
> my_array <- c(NA,100,120,120,120,130,NA)
> sum(my_array)
[1] NA
> sum(my_array,na.rm=T) # most functions allow you to explicitly state how to
handle NA
[1] 590
> table(my_array) # HOWEVER the default action varies from function to function
my_array
100 120 130
1 3 1
Introduction to R: Joseph Powell
R help pages
• Each function has its own unique syntax– Default arguments– Data structure requirements– Output options
> ?seq # brings up help page of seq() function > ??”sequence” # searches for all related functions
• Note > seq(from = 2, to = 100, by = 2)
is clearer than > seq(2,100,2)
Introduction to R: Joseph Powell
Basic Scripting
• Note pad / text editor– Within the R GUI– Open with: File > New Script or Ctrl+N– Layout as tile is useful: Windows > Tile
Introduction to R: Joseph Powell
Basic Scripting
• Note pad / text editor– Useful for keeping all work together– Scripts can be saved– Can be used to save a “program”– Add # comments
– Check individual bits of code– Ctrl+R
• Whole line• Selected code
Introduction to R: Joseph Powell
Basic Scripting
• Brackets– ( ) functions– [ ] subsets– { } processes
• Subsets– Take a subset of an object– Objects have either 1 x n, or m x n dimensions
> x
[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
> x
[1] 2 5 6 2 6 77 55 > x[5]
[1] 6
> X[3,4]
[1] 12
[rows, columns]
Introduction to R: Joseph Powell
Basic Scripting
• Data input– Direct input into the console
• scan()
– Reading in data• read.table / read.csv
– “name.txt”– “c:\\temp\\name.txt”– choose.file()
– list.files()– dir()
> y <- scan()1: 32: 43: 124: 35: 56: 27: 148: Read 7 items
> dir() [1] "temp.csv" "temp2.csv" “name.txt”
> y <- read.table("name.txt", header=T, sep="\t")
>
Introduction to R: Joseph Powell
Basic Scripting
• Data output– Direct input into the console
• sink()
– Writing out data• write.table / write.csv
– “name.txt”– “c:\\temp\\name.txt”
sink(“sink_tmp.txt”)
i <- 1:10
outer(i, i, "*")
sink()
> dir() [1] "temp.csv" "temp2.csv" “name.txt”
> write.table("name.txt", header=T, sep="\t")
>
Introduction to R: Joseph Powell
Basic Scripting
• Adding rows and columns – Allows objects to be joined, either to an existing object or to make a new
object
– cbind() – adds columns together– rbind() – adds rows together
> y1 [,1] [,2] [,3][1,] 1 3 12.5[2,] 1 2 13.8[3,] 1 5 15.3[4,] 1 4 16.8
> y2 [,1][1,] 0.349[2,] 0.745[3,] 0.684[4,] 0.964
> y3 <- cbind(y1, y2)> y3 [,1] [,2] [,3] [,4][1,] 1 3 12.5 0.349[2,] 1 2 13.8 0.745[3,] 1 5 15.3 0.684[4,] 1 4 16.8 0.964
> y3 <- rbind(y1, y2[1:3])> y3 [,1] [,2] [,3][1,] 1.000 3.000 12.500[2,] 1.000 2.000 13.800[3,] 1.000 5.000 15.300[4,] 1.000 4.000 16.800[5,] 0.349 0.745 0.684
Introduction to R: Joseph Powell
Basic Scripting
• for loops– loop through a set of commands a given number of times– very useful, but are not optimal for memory > dim(y)[1] 10 10
> for(i in 1:ncol(y)) { y_mean <- mean(y[i, 1:10]) }
> y_mean[1] 0.1974492
> out <- array(0, c(ncol(y), 1))
> for(i in 1:ncol(y)) { out[i] <- mean(y[i, ]) }
> out [,1] [1,] -0.3110800 [2,] -0.2000344 [3,] 0.2019573 [4,] 0.2859823 [5,] 0.1932523 [6,] 0.2759323 [7,] -0.2571102 [8,] -0.1037983 [9,] 0.3522018[10,] 0.1974492
Introduction to R: Joseph Powell
Data Manipulation
• Check data– dim()– mydata[1:10, 1:10]– str()– summary()– head()– tail()– table()– etc…
> mydata <- read.table("mydata.txt", header=T, sep="\t")> dim(mydata)[1] 642 1470
> mydata[1:10, 1:10]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 2 2 1 2 1 2 0 1 0 1 [2,] 0 0 2 2 0 0 1 2 1 2 [3,] 0 2 2 2 1 1 0 0 2 1 [4,] 2 0 2 2 2 0 1 2 0 1 [5,] 2 0 0 2 0 1 1 0 2 0 [6,] 2 1 2 1 1 0 2 2 1 1 [7,] 1 1 2 2 1 2 2 2 0 1 [8,] 0 1 0 0 0 1 1 1 1 1 [9,] 0 0 1 2 1 2 2 0 0 1[10,] 1 0 1 1 2 0 1 0 0 1
Introduction to R: Joseph Powell
Data Manipulation
• Reordering– If you have a data.frame or matrix (numbers or letters)
– Use: order()– index <- order(old[,1], decreasing=T)
> dim(lamb)[1] 1600 5> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F3 A 25.52592 1 1 M4 A 25.56016 1 1 M5 A 24.53296 1 2 F6 A 22.03344 1 2 F
> lamb <- lamb[order(lamb$sex, decreasing=F), ]
> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F5 A 24.53296 1 2 F6 A 22.03344 1 2 F9 A 30.37944 2 1 F10 A 25.93680 2 1 F
Introduction to R: Joseph Powell
Data Manipulation
• Reordering– order()> lamb <- lamb[order(lamb$sex, decreasing=F), ]
> rows <- order(lamb$sex, decreasing=F)> lamb <- lamb[rows, ]
> index <- order(lamb$sex, decreasing=F)
> head(index)
[1] 1 2 5 6 9 10
> lamb <- lamb[index, ]
Expanded way
Introduction to R: Joseph Powell
Data Manipulation
• Replacing– index – which()
> class(lamb)[1] “matrix”> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F3 B 25.52592 1 1 M
> index <- lamb[,1]==“A”> head(index)[1] TRUE TRUE FALSE TRUE FALSE
> lamb[index, 1] <- ”C”
> head(lamb) Field Weight sire dam sex1 C 22.92368 1 1 F2 C 27.52896 1 1 F3 B 25.52592 1 1 M
> index <- which(lamb[,1]=="A")> head(index)1 2 4 6 7 10
> lamb[index, 1] <- ”C”
> lamb[which(lamb[,1]==”A”, 1] <- ”C”
Put it together
Introduction to R: Joseph Powell
Data Manipulation
• Replacing> class(lamb)[1] “matrix”> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F3 B 25.52592 1 1 M
> index <- lamb[,2] <= 22.000> table(index)indexFALSE TRUE 1553 47
> lamb[index, 2] <- ”NA”
> which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 842 921 983 1103 1126
> which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496
> new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) , ]
> new_lamb
Field Weight sire dam sex214 A 2046 27 2 F363 A 2008 46 1 M496 A 2041 62 2 M
Graphics with R: Overview
1. Why graphics?
2. Why graphics in R?
3. The R graphics systems (did you really expect just one?)
4. Graphics basics and examples
5. Customisation of a graphic
6. Overview of different systems and packages
Introduction to R: Joseph Powell
plot(x, y, …)
> ?Formaldehyde> head(Formaldehyde) carb optden1 0.1 0.0862 0.3 0.2693 0.5 0.4464 0.6 0.5385 0.7 0.6266 0.9 0.782> plot(Formaldehyde)> ?par
Introduction to R: Joseph Powell