data & graphing
DESCRIPTION
Data & Graphing. vectors data frames importing data contingency tables barplots. 18 September 2014 Sherubtse Training. Data CLASSES in R. Vector: a single string of data Factor: categorical data, stored as category levels with frequencies Matrix: 2D table of data - PowerPoint PPT PresentationTRANSCRIPT
Data & Graphing
vectorsdata frames
importing datacontingency tables
barplots
18 September 2014 Sherubtse Training
Data CLASSES in R• Vector: a single string of data • Factor: categorical data, stored as
category levels with frequencies• Matrix: 2D table of data• Array: >2D table of data• Data Frame: 2D table that can accept
different data modes • List: General structure for organizing all
project data
mem
ory
used
(obj
ect.s
ize)
Data MODES in R
• Character/String: letters and text in quotation marks
• Numeric/Integer: numbers
• Logical: TRUE, FALSE, T, F (must be capital letters, no quotes; converts to 0 & 1 for arithmetic)
Data Classes: VectorsVECTORA single string of data of the same “mode”
Examples: Numeric or Integer Modex <- c(1, 0, -5, 10, 300)x <- c(2+2, 9-6, 5)x <- c(2.5, 3.9, 0.7, 4.0)
numeric or integer mode(spaces are for easy reading)
logical modeanswer <- c(TRUE, FALSE, TRUE, TRUE)answer <- c(T, F, T, T)
Examples: Logical Mode
Data Classes: VectorsVECTORA single string of data of the same “mode”
Examples: Character Mode
character mode (single quotes also okay)
animals <- c(“dog”, ”cat”, ”bird”)string <- c(“a”, ”c”, ”d”, ”z”, ”p”)answer <- c(“T”, “F”, “T”, “T”)values <- c(“-9”, “0.2”, “1.4”)
1 -5 10 300
Working with Vectors
Use subscripts to refer to elements of a vector:> x <- c(1, 0, -5, 10, 300)
x[3]
x[c(1, 4, 5)]
x[-2]
-5
1 10 300
x[vector_position]
x[1:4] 1 0 -5 10
Logical Operators
Working with VectorsEdit the vector:> x <- c(1, 0, -5, 10, 300)
Append (add) data to the end of the vector:
Change a single value in the vector:
1 0 -5 10 300 400 500 700
1 0 -5 10 300 90 500 700
x <- c(x, 400, 500, 700) # NOTE: Also try append()
x[6] <- 90
Replace values > 100 with NA:1 0 -5 10 NA 90 NA NA
x[x>100]<-NAx[which(x>100)]<-NA# Also try replace()
Importing DataOPTION 1Type data directly into R
OPTION 2Use job <- scan(what="character") to paste in the following data copied from an Excel column
Import the ‘job’ column data (exclude column heading) from the ‘Work’ tab in Excel, and assign it the variable name ‘job’
How might we graph these data?
Here's a hint...
table(job)
For example, you can just create a vector with labels, then make a barplot of the vector, or put the vector directly in barplot:job.count <- c("farmer"=12, "government"=2, "laborer"=4, "teacher"=2)
Importing DataOPTION 3Export the data as a csv- or tab-delimited text file, then import the text file into R
Import the ‘HtWt’ dataset(notice how the data are arrangedin Excel)
Data Classes: Data Frames
DATA FRAMESA data frame is similar to the data format used in SPSS...different columns can have different modes (numeric, character, factor, etc.)
Working with Data FramesThere are many way to refer to the elements in data frames... but we will focus on just a few
To access the height column HtWt$cmHtWt[“cm”]HtWt[4]
Working with Data Frames
To access a rowHtWt[5,]
To access an elementHtWt[5,4] HtWt[5,”cm”]
What kinds of interesting questions can we ask?What graphs would we make to answer them?
HtWt Data
• Is there a difference in height between UWICE & SFS personnel? Does it differ for males vs. females?
• Is there a difference in weight between UWICE & SFS personnel? Does it differ for males vs. females?
• Is there a relationship between height and weight for UWICE personnel? How about for SFS personnel?
• Is there a relationship between height and weight for males? How about for females?
Bar PlotsFor comparing COUNTS, PROPORTIONS (%) or MEANS
of data in different qualitative categories. Often we make bar plots of summary data.
Use the table() function to create a contingency table of sample counts by
INSTITUTE and SEX. Try it also using with()
table(HtWt$institute,HtWt$sex)
Working with Data Frames
Now make a stacked barplot from the table you just created
Add title, labels, legend and color...
Convert it to a side-by-side barplot
Move the legend to the top centerADD AS AN ARGUMENT: args.legend=list (horiz=T, x="top")
Transpose the data: t(tab.HtWt)
Working with Data FramesUse the function subset() to create a new data frame
called ‘UWICE’ that includes only UWICE data
UWICE <- subset(HtWt,institute=="UWICE")
Now subset the HtWt data to get a data frame with only 'SFS' data and only the 'INSTITUTE' and 'SEX'
columns. Call this data frame 'SFS.sex'
SFS.sex <- subset(HtWt,institute=="SFS",select=1:2)
1) Install & load the package reshape2
2) Import the Livestock data and save it to a variable called farms
3) Use the function cast() to reformat the farms data to a matrix form for stacked barplots:
m.farms<-acast(farms,town~livestock)
4) Make a stacked barplot from m.farms
Reshaping Data
Make this graph—note that the y-axis values should be from 0 to 60