r-data structure the simplest data structure r operates on is the vector
DESCRIPTION
R-Data Structure The simplest data structure R operates on is the vector. Vector Can contain numerical data, string, or mix values Can increase the size of the vector by adding “concatenating additional columns”. Numerical content. Adding a string to A numerical vector changes the vector - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/1.jpg)
1
R-Data StructureThe simplest data structure R operates on is the vector
• VectorCan contain numerical data, string, or mix valuesCan increase the size of the vector by adding “concatenating additional columns”
Numerical content
Adding a string toA numerical vector changes the vector to a list (vector withmixed content type
![Page 2: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/2.jpg)
2
Operation on numerical vectorNormal operataion: -,+,* 1/x (reciprocal),mean, etc.For example:• v<-c(1,2,3,4)• inv <- 1/v #will assign to inv the
reciprocal of each value of vExample:y <- c(v, 0, v)z<-mean(y)
![Page 3: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/3.jpg)
3
Sequences (special vectors of numeric values)
1:n means 1,2,..nExample1V<-c(1:3) means v<-c(1,2,3) Example2: n<-1:30 Example3:
n<-2*1:15 “:” has higher priority
Example4: n<-seq(-5:5)
ExerciseTry seq(-5,5) and compare with seq(-5:5)
Use help(seq) to learn more about the seq instruction
![Page 4: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/4.jpg)
4
• Matrixmatrix(data, nrow, ncol, byrow)The data is a list of the elements that will fill the matrixThe nrow and ncol arguments specify the dimension of the matrix.
Often only one dimension argument is needed. For example, if there are 20 elements in the data list and ncol is specified to be 4 then R will automatically determine that there should be 5 rows and 4 columns since 4*5=20.
byrow takes value in {TRUE,FALSE}The byrow argument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.
R-Data Structure
![Page 5: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/5.jpg)
5
[,1] means“all the rows of column 1”
[1,] means“all the columns of row 1”
![Page 6: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/6.jpg)
6
• Data FrameA data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors
v1, v2, v3. v1 = c(2, 3, 5) v2 = c("aa", "bb", "cc") v3 = c(TRUE, FALSE, TRUE) df = data.frame(v1, v2, v3)# df is a data framedf
R-Data Structure
![Page 7: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/7.jpg)
7
List: A list is a vector in which the various elements need not be of the same typeExampleV<-c(1,”2”,”hello”,TRUE)
Factor: A factor is a vector of categorical data.Storing data as factors insures that the modeling functions will treat such data correctly. Example:> data = c(1,2,2,3,1,2,3,3,1,2,3,3,1) > fdata = factor(data) > fdata [1] 1 2 2 3 1 2 3 3 1 2 3 3 1 Levels: 1 2 3
R-Data Structure
The output shows the content of fdata but also The distinct values of the categorical attribute
![Page 8: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/8.jpg)
8
Importing Data• read.table(path, more parameters…)
mydata <- read.table("c:/mydata.csv", header=TRUE, sep=",", row.names="id")
Path to the filenote the “/” instead of “\” on Ms windows systems
TRUE=Include the header row
DelimiterUsed in the file
OptionalRow names
Use help(read.table) for more infoAlso consider read.csv() instruction to import commas delimited For example:read.csv("http://www2.cs.uh.edu/~zechun_cao/TA_Resources/iris.data")
![Page 9: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/9.jpg)
9
Original file
Output
![Page 10: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/10.jpg)
10
Excel File read.xlsExercise:The best way to import data in Excel format is to save the data as .csv and then use read.table() to import it. However, the read.xls is often used. Since it is not part of the core R library, it has to be installed and loaded into the workspace.Use read.xls to read an excel file into R. (read.xls is part of the gdata package
Importing Data
![Page 11: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/11.jpg)
11
Answer
>install.packages(pkgs="gdata")>library(gdata)>data <- read.xls(path)
![Page 12: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/12.jpg)
12
1. Select columns (variables)
2. Drop columns (variables)
3. Select Observations (rows)
4. Random Sampling (exercise)
Operations on Dataset/sub-settingx1,x2,x3,class0,2,2,A0,3,2.5,B0,3,3,A1,3,3,B1,3.5,4,c1,3,2,A1,4,2,c1,4,3,A0,1,3,B0,1,4,A1,2,2,c1,2.5,1,A
dataset
![Page 13: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/13.jpg)
13
Sub-setting by selecting columnsExample1:# select variables x1, x3myvars <- c(“x1", “x3“)mysubSet<- dataset [myvars]mysubSet
Example2:# select jth variable and kth thru mth variablesnewdata <- dataset[c(j,k:m)]
Operations on Dataset/sub-setting
![Page 14: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/14.jpg)
14
Drop some columns# exclude 1st and 3rd variable mysubSet <- dataset[c(-1,-3)]
Also to delete a column assign NULL to the columnExample:# delete variables x1mydata$x1<- NULL
Operations on Dataset/sub-setting
![Page 15: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/15.jpg)
15
Select Observations# first n observationsmysubSet <- dataset[1:n,]
# based on variable valuesmysubSet <- dataset[ which(dataset$x3==2 & dataset$x2 > 2), ]
Or equivalently# based on variable valuesattach(dataset)mysubSet <- dataset[ which(x3==2 & x2 > 2), ]detach(dataset)
Operations on Dataset/sub-settingGet row 1 to n, for all columns
![Page 16: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/16.jpg)
16
Sampling dataset = read.csv("C:/Users/paul/Desktop/R_wd/Lab/example.csv")datasetdataset[sample(nrow(dataset), 3), ]
Using the dataset in the next box write a script that selects 4 rows randomly.
Step 1: import the file.Step2: use srsdf to sample
Operations on Dataset/sub-setting
x1,x2,x3,class0,2,2,A1,3,2.5,B1.5,3.8,3,A2,4,3,B2.1,3.5,4,c2.3,3.8,2,A2.8,4,2,c3,4,3,A3.2,4.5,3,B3.4,4.6,4,A3.6,4.8,2,c3.6,5,1,A
![Page 17: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/17.jpg)
17
Answerdataset = read.csv("C:/Users/paul/Desktop/R_wd/Lab/example.csv")datasetdataset[sample(nrow(dataset), 3), ]
![Page 18: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/18.jpg)
18
Operations on Dataset
Split data frame or matrix split() #divide into groups by vector/factorExample>dataset = read.csv("C:/Users/paul/Desktop/R_wd/input/Data_TPRTI/weka/EXAMPLE.csv")>classes<-split(dataset,dataset$class)>classes
Observe that split() has grouped the row of same class together because the group column was specified to be the class column
![Page 19: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/19.jpg)
19
• subset() #subset data with logical statementThe subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of x3==2 and x2>2. We keep the x1, x2, and class columns. mysubSet<- subset(dataset, x3==2 & x2 >2, select=c(x1, x2,class))
Operations on Dataset/sub-setting
![Page 20: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/20.jpg)
20
Use help() to learn about • Merge data framesmerge() #merges two data frames d1, and d2 into one data frame
• Combine a row or column to a data framecbind() :Add a new column to a data frame rbind() : a new row to a data frame
Operations on Dataset/sub-setting
![Page 21: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/21.jpg)
21
Practice exercise
Exercise1-Download the dataset Iris fromwww.cs.uh.edu/~zechun_cao/DM12F.html andimport the data Into your R session2-Find out how many classes are in the file. The output column is the last column
3-Multiply the 3rd column by 2 and combine this new column to the data frame.
![Page 22: R-Data Structure The simplest data structure R operates on is the vector](https://reader035.vdocuments.site/reader035/viewer/2022081604/56815f00550346895dcdbd59/html5/thumbnails/22.jpg)
22
Complete the exercise
Thank you!