1 an introduction ucf, methods in ecology, fall 2008 an introduction by danny k. hunt eric d....
DESCRIPTION
3 An Introduction – UCF, Methods in Ecology, Fall 2008 More About Dataframes Dataframes Reviewed –Are objects that consist of rows and columns –Closely related to MS Access or Excel tables –Specific requirements for dataframes Observations are in rows The response variable and explanatory variables are in columns Same variable results go into the same columnTRANSCRIPT
11An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
An IntroductionAn Introduction
ByByDanny K. Hunt & Eric D. StolenDanny K. Hunt & Eric D. Stolen
Working In RWorking In R(with speaker notes)
22An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
What We Will LearnWhat We Will Learn
More About DataframesMore About DataframesSlicing, Dicing and Sorting DataSlicing, Dicing and Sorting DataManipulating Dataframes & Manipulating Dataframes & Aggregating DataAggregating DataSimple Iterative ProcessingSimple Iterative ProcessingBasic Data VisualizationBasic Data Visualization
33An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
More About DataframesMore About Dataframes
Dataframes ReviewedDataframes Reviewed– Are objects that consist of rows and columnsAre objects that consist of rows and columns– Closely related to MS Access or Excel tables Closely related to MS Access or Excel tables – Specific requirements for dataframesSpecific requirements for dataframes
Observations are in rowsObservations are in rows The response variable and explanatory variables are in The response variable and explanatory variables are in
columnscolumns Same variable results go into the same columnSame variable results go into the same column
44An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
ObservationObservation ResponseResponse TreatmentTreatment
Obs1Obs1 1.51.5 AA
Obs2Obs2 1.61.6 BB
Obs3Obs3 1.31.3 CC
…… …… ……
Dataframes ReviewedDataframes Reviewed– Specific requirements for dataframes (continued)Specific requirements for dataframes (continued)
Observations are in rowsObservations are in rows The response variable and explanatory variables are in The response variable and explanatory variables are in
columnscolumns
More About DataframesMore About Dataframes
55An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
More About DataframesMore About Dataframes
ControlControl X1X1 X2X2
1.31.3 1.41.4 1.31.3
1.11.1 1.31.3 1.61.6
1.41.4 1.51.5 1.81.8
IDID ResponseResponse TreatmentTreatment
11 1.31.3 ControlControl
22 1.11.1 ControlControl
33 1.41.4 ControlControl
44 1.41.4 X1X1
55 1.31.3 X1X1
66 1.51.5 X1X1
77 1.31.3 X2X2
88 1.61.6 X2X2
99 1.81.8 X2X2
Dataframes ReviewedDataframes Reviewed– Specific requirements for dataframes (continued)Specific requirements for dataframes (continued)
Same variable results go into the same columnSame variable results go into the same column Good dataframe conventionsGood dataframe conventions
66An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
More About DataframesMore About Dataframes
Loading Dataframes with DataLoading Dataframes with Data– Use the Use the read.table()read.table() series of functions series of functions
Particularly simple and efficient is the Particularly simple and efficient is the read.delimread.delim functionfunction
– Load Snake dataset and get acquaintedLoad Snake dataset and get acquainted snake <- read.delim(“c:\\pract_i-t\\snake_data.txt", row.names="ID")snake <- read.delim(“c:\\pract_i-t\\snake_data.txt", row.names="ID") names(snake)names(snake) snake[1:5,]snake[1:5,] summary(snake)summary(snake)
77An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
More About DataframesMore About Dataframes
Loading Dataframes with Data (continued)Loading Dataframes with Data (continued)– Identifying columns as categorical variablesIdentifying columns as categorical variables
During read operations alpha-numeric fields are During read operations alpha-numeric fields are automatically encoded as factorsautomatically encoded as factors
Numeric fields are automatically assumed to be Numeric fields are automatically assumed to be continuous valuescontinuous values
Identify numeric fields as factors using:Identify numeric fields as factors using:– snake$landc <- factor(snake$landc)snake$landc <- factor(snake$landc)– snake$SEX <- factor(snake$SEX)snake$SEX <- factor(snake$SEX)
summary(snake)summary(snake)
88An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Slicing, Dicing and Sorting DataSlicing, Dicing and Sorting Data
SubscriptsSubscripts– Performing data extraction by indexing:Performing data extraction by indexing:
snake[1:5,]snake[1:5,] snake[40,]snake[40,] snake[,1]snake[,1] snake[,c(7, 1, 2)]snake[,c(7, 1, 2)] snake[-c(10, 20, 30, 40), c("Name", "SEX", "ha")]snake[-c(10, 20, 30, 40), c("Name", "SEX", "ha")]
99An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Slicing, Dicing and Sorting DataSlicing, Dicing and Sorting Data
Subscripts (continued)Subscripts (continued)– Performing data extraction by using filtering:Performing data extraction by using filtering:
snake[snake$landc == 1,]snake[snake$landc == 1,] snake[snake$landc %in% c(1, 3),]snake[snake$landc %in% c(1, 3),] snake[snake$mcp > 200,]snake[snake$mcp > 200,] snake[snake$mcp > 200 & snake$times < 60,]snake[snake$mcp > 200 & snake$times < 60,] snake[grep("^m", snake$Name ),]snake[grep("^m", snake$Name ),]
1010An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Slicing, Dicing and Sorting DataSlicing, Dicing and Sorting Data
Sorting DataSorting Data– snake[order(snake$Name),]snake[order(snake$Name),]– snake[order(snake$landc, -snake$mcp),]snake[order(snake$landc, -snake$mcp),]– snake[order(-snake$times),][1:10,]snake[order(-snake$times),][1:10,]
1111An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Manipulating Dataframes &Manipulating Dataframes &Aggregating DataAggregating Data
Adding Columns to a DataframeAdding Columns to a Dataframe– Close out and restart RClose out and restart R– Load the rapid fish datasetLoad the rapid fish dataset
fish <- read.delim("c:\\pract_i-t\\rapid_fish.txt", row.names="ID")fish <- read.delim("c:\\pract_i-t\\rapid_fish.txt", row.names="ID") rapid$impoundment <- factor(rapid$impoundment)rapid$impoundment <- factor(rapid$impoundment) rapid$season <- factor(rapid$season)rapid$season <- factor(rapid$season) rapid$open_veg <- factor(rapid$open_veg)rapid$open_veg <- factor(rapid$open_veg) rapid$sea_code <- factor(rapid$sea_code)rapid$sea_code <- factor(rapid$sea_code) rapid$imp_code <- factor(rapid$imp_code)rapid$imp_code <- factor(rapid$imp_code) rapid$cov_code <- factor(rapid$cov_code)rapid$cov_code <- factor(rapid$cov_code)
1212An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Manipulating Dataframes &Manipulating Dataframes &Aggregating DataAggregating Data
Adding Columns to a Dataframe (cont.)Adding Columns to a Dataframe (cont.)– Add “unique” columnAdd “unique” column
rapid$unique <- rapid$unique <- paste(rapid$Point, rapid$season, sep = "")paste(rapid$Point, rapid$season, sep = "")
rapid$unique <- factor(rapid$unique)rapid$unique <- factor(rapid$unique)
– Add log transformation of count columnAdd log transformation of count column rapid$lncount <- log(rapid$count + 1)rapid$lncount <- log(rapid$count + 1)
1313An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Manipulating Dataframes &Manipulating Dataframes &Aggregating DataAggregating Data
Overview the Rapid Fish DatasetOverview the Rapid Fish Dataset– names(rapid)names(rapid)– rapid[1:5,] rapid[1:5,] – summary(rapid)summary(rapid)
Filtered SummariesFiltered Summaries– summary(rapid[rapid$open_veg == "open",])summary(rapid[rapid$open_veg == "open",])– summary(rapid[rapid$open_veg == "vegetated",])summary(rapid[rapid$open_veg == "vegetated",])
1414An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Manipulating Dataframes &Manipulating Dataframes &Aggregating DataAggregating Data
Cross TabulationCross Tabulation– Using the Using the table()table() function function
table(rapid$impoundment)table(rapid$impoundment) table(rapid[, c("open_veg", "impoundment")])table(rapid[, c("open_veg", "impoundment")]) table(rapid[, c("open_veg", "impoundment", "season")])table(rapid[, c("open_veg", "impoundment", "season")])
1515An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Manipulating Dataframes &Manipulating Dataframes &Aggregating DataAggregating Data
Data AggregationData Aggregation– Using the Using the aggregation()aggregation() function function
aggregate(rapid$count, list(impoundment=rapid$impoundment), mean)aggregate(rapid$count, list(impoundment=rapid$impoundment), mean) aggregate(rapid$count, list(open_veg=rapid$open_veg, aggregate(rapid$count, list(open_veg=rapid$open_veg,
impoundment=rapid$impoundment, season=rapid$season), mean)impoundment=rapid$impoundment, season=rapid$season), mean)
1616An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Manipulating Dataframes &Manipulating Dataframes &Aggregating DataAggregating Data
Merging ResultsMerging Results– Using the Using the merge()merge() function function
n <- table(rapid[, c("open_veg", "impoundment", "season")])n <- table(rapid[, c("open_veg", "impoundment", "season")]) m <- aggregate(rapid$count, list(open_veg=rapid$open_veg, m <- aggregate(rapid$count, list(open_veg=rapid$open_veg,
impoundment=rapid$impoundment, season=rapid$season), mean)impoundment=rapid$impoundment, season=rapid$season), mean) names(m)[4] <- "mean"names(m)[4] <- "mean" s <- aggregate(rapid$count, list(open_veg=rapid$open_veg, s <- aggregate(rapid$count, list(open_veg=rapid$open_veg,
impoundment=rapid$impoundment, season=rapid$season), sd)impoundment=rapid$impoundment, season=rapid$season), sd) names(s)[4] <- "sd"names(s)[4] <- "sd" nm <- merge(n, m)nm <- merge(n, m) names(nm)[4] <- "n"names(nm)[4] <- "n" habimpsea <- merge(nm, s)habimpsea <- merge(nm, s) habimpseahabimpsea
1717An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Simple Iterative ProcessingSimple Iterative Processing
Repeating a ProcessRepeating a Process– Using the Using the for()for() control-flow construct control-flow construct
site <- sort(unique(rapid$impoundment))site <- sort(unique(rapid$impoundment)) for (i in 1:length(site)) {for (i in 1:length(site)) { print (summary(rapid[rapid$impoundment == site[i],]))print (summary(rapid[rapid$impoundment == site[i],])) }}
1818An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
Basic Data VisualizationBasic Data Visualization
Visualizing DataVisualizing Data– Using the Using the plot()plot() function function
plot(as.numeric(rapid$sea_code), rapid$count)plot(as.numeric(rapid$sea_code), rapid$count)
– Using the Using the boxplot()boxplot() function function boxplot(rapid$count~rapid$impoundment)boxplot(rapid$count~rapid$impoundment)
– Using the Using the hist()hist() function function par(mfrow=c(1,2)) par(mfrow=c(1,2)) hist(rapid$count)hist(rapid$count) hist(rapid$lncount)hist(rapid$lncount)
1919An Introduction – UCF, Methods in Ecology, Fall 2008An Introduction – UCF, Methods in Ecology, Fall 2008
The EndThe End