introduction to r - lecture 4: looping

44
Introduction to R - Lecture 4: Looping Andrew Jaffe 9/27/2010

Upload: moira

Post on 15-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Introduction to R - Lecture 4: Looping. Andrew Jaffe 9/27/2010. Overview. Practice Review The ‘for’ loop Rationale Syntax Application Getting creative…. Practice overview. Compute the average dog weight, dog length, and dog food consumption for each dog type at baseline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to R - Lecture 4: Looping

Introduction to R -Lecture 4: Looping

Andrew Jaffe

9/27/2010

Page 2: Introduction to R - Lecture 4: Looping

Overview

Practice Review The ‘for’ loop

RationaleSyntaxApplicationGetting creative…

Page 3: Introduction to R - Lecture 4: Looping

Practice overview

Compute the average dog weight, dog length, and dog food consumption for each dog type at baseline

Page 4: Introduction to R - Lecture 4: Looping

Practice Overviewmean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "lab"])mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "husky"])mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "poodle"])mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "retriever"])

mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "lab"])mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "husky"])mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "poodle"])mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "retriever"])

mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "lab"])mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "husky"])mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "poodle"])mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "retriever"])

Page 5: Introduction to R - Lecture 4: Looping

Overview

Practice Review The ‘for’ loop

RationaleSyntaxApplicationGetting creative…

Page 6: Introduction to R - Lecture 4: Looping

Loop Rationale

Download “lec4_data.rda” from the website under Lecture 4 data

Load it into RRemember – load(filename)Check your workspace with ls()

Page 7: Introduction to R - Lecture 4: Looping

Loop Rationale

What are the dimensions of the dataset?

Page 8: Introduction to R - Lecture 4: Looping

Loop Rationale

What are the dimensions of the dataset?

> dim(dog_dat)[1] 482 39

Page 9: Introduction to R - Lecture 4: Looping

Loop Rationale

What are the variable names?

Page 10: Introduction to R - Lecture 4: Looping

Loop Rationale

What are the variable names?> names(dog_dat) [1] "dog_id" "owner_id" "dog_type" [4] "dog_wt_mo1" "dog_wt_mo2" "dog_wt_mo3" [7] "dog_wt_mo4" "dog_wt_mo5" "dog_wt_mo6" [10] "dog_wt_mo7" "dog_wt_mo8" "dog_wt_mo9" [13] "dog_wt_mo10" "dog_wt_mo11" "dog_wt_mo12" [16] "dog_len_mo1" "dog_len_mo2" "dog_len_mo3" [19] "dog_len_mo4" "dog_len_mo5" "dog_len_mo6" [22] "dog_len_mo7" "dog_len_mo8" "dog_len_mo9" [25] "dog_len_mo10" "dog_len_mo11" "dog_len_mo12" [28] "dog_food_mo1" "dog_food_mo2" "dog_food_mo3" [31] "dog_food_mo4" "dog_food_mo5" "dog_food_mo6" [34] "dog_food_mo7" "dog_food_mo8" "dog_food_mo9" [37] "dog_food_mo10" "dog_food_mo11" "dog_food_mo12"

Page 11: Introduction to R - Lecture 4: Looping

Loop Rationale

dog_wt_mo1-12: the dog’s weight at each of the 12 months

dog_len_mo1-12: the dog’s length at each of the 12 months

dog_food_mo1-12: the dog’s food consumption at each of the 12 months

Page 12: Introduction to R - Lecture 4: Looping

Loop Rationale

Now, compute the average dog weight, dog length, and dog food consumption for each dog type at EVERY visit

That would be 36*4 = 144 lines of code that’s almost identical

Now let’s talk about the ‘for’ loop….

Page 13: Introduction to R - Lecture 4: Looping

Overview

Practice Review The ‘for’ loop

RationaleSyntaxApplicationGetting creative…

Page 14: Introduction to R - Lecture 4: Looping

Syntax

for(i in 1:10) {

print(i)

}

Curly brackets designate a loop (or function).

The body of the loop is between them.

variable sequence

Page 15: Introduction to R - Lecture 4: Looping

Syntax

> for(i in 1:10) {+ print(i)+ }[1] 1[1] 2[1] 3[1] 4[1] 5[1] 6[1] 7[1] 8[1] 9[1] 10

Sets i=1, and run the loop body until the end

Sets i=2, and run the loop body until the end

Sets i=10, and run the loop body until the end

Page 16: Introduction to R - Lecture 4: Looping

Syntax

Another way to think about it: set i=1, and then just run the loop body

> i=1> print(i)[1] 1> i=2> print(i)[1] 2

Page 17: Introduction to R - Lecture 4: Looping

Syntax

Some notes/comments: ‘i’ is a common variable for loops, but it can

be anything: ‘x’, ‘names’, etcThat variable will get set to the loop

sequence, and get overwritten if it existsRun the ‘for’ loop above (with print), and type i

– it should equal 10

Page 18: Introduction to R - Lecture 4: Looping

Syntax

> b = 0> for(i in 1:10) {+ b = b + i+ }> b[1] 55

> b = 0> for(i in 1:10) {+ b = b + i+ print(b)+ } [1] 1[1] 3[1] 6[1] 10[1] 15[1] 21[1] 28[1] 36[1] 45[1] 55

i=1: 0[b] + 1 = 1 = b

i=2: 1[b] + 2 = 3 = b

i=3: 3[b] + 3 = 6 = b

i=4: 6[b] + 4 = 10 = b

i=10: 45[b] + 10 = 55 = b

Page 19: Introduction to R - Lecture 4: Looping

Syntax

We don’t just want to print stuff usually – we want to manipulate data and save it

Procedure: create a blank vector, then fill in that vector with a ‘for’ loop

Page 20: Introduction to R - Lecture 4: Looping

Syntax

Guess what this is doing:b = 0b_vec = rep(0, 10)for(i in 1:10) {

b = b + ib_vec[i] = b

}>> b_vec [1] 1 3 6 10 15 21 28 36 45 55

We’re using the looping variable to index!

Page 21: Introduction to R - Lecture 4: Looping

Syntax

That last loop, step by step:Set b=0 and create a blank vector of length 10For 1 through 10, add each iteration to its

running sum Ie sum(1:10) – 1 + 2 + … + 10 = 55

Store that sum in vector b_vec

Page 22: Introduction to R - Lecture 4: Looping

Overview

Practice Review The ‘for’ loop

RationaleSyntaxApplicationGetting creative…

Page 23: Introduction to R - Lecture 4: Looping

Application

Let’s take a step forward, and calculate the average dog weight, dog length, and dog food consumption for all dogs at every visit

Instead of looping over a vector, we will loop over a matrix/data.frame

Page 24: Introduction to R - Lecture 4: Looping

Application

Let’s try just dog weight first We can loop over non-sequential variables

(indices) in a dataset Here, we want columns 4-15 of dog_wt,

which corresponds to the dog’s weights at each month

Page 25: Introduction to R - Lecture 4: Looping

Application

Looping over non-sequential elements is easy to do

However, you have to be careful when saving outputs of non-sequential elements

Page 26: Introduction to R - Lecture 4: Looping

Application

> Index = c(1,3,5,7,9)> out = rep(NA,5)> mat = matrix(rnorm(100), ncol = 10)> for(i in Index) {+ out[i] = mean(mat[,i])+ } > out[1] 0.2230609 NA -0.2862340 NA[5] 0.3940720 NA -0.1284383 NA[9] 0.1291539

Wrong – there’s missing data! Note that we want out[1:5] to correspond to mat[,c(1,3,5,7,9)]

Page 27: Introduction to R - Lecture 4: Looping

Application

Index = 4:15mean_wt <- rep(0, length(Index))

for(i in 1:length(Index)) {ind = Index[i] # column indexmean_wt[i] = mean(dog_dat[,ind])

}

Page 28: Introduction to R - Lecture 4: Looping

Application

Here, we are defining our column indices first, and creating a blank vector of that length

We then loop over each value of that column index, take the mean of the resulting vector, and store it in the blank vector

This allows us to store the mean of the fourth column in the first position of our output vector, the fifth column in the 2nd position, 6th column in the 3rd position, etc

Page 29: Introduction to R - Lecture 4: Looping

Application

So, the first time through the loop, we take the item from the i’th position of the Index

The first time through the loop, i=1, and ind = Index[1] = 4

Page 30: Introduction to R - Lecture 4: Looping

Application

Why not just loop using for(i in 4:15)? Aka for(i in Index)

> Index = 4:15> mean_wt <- rep(0, length(Index))> for(i in Index) {+ mean_wt[i] = mean(dog_dat[,i])+ } > mean_wt [1] 0.00000 0.00000 0.00000 49.69606 48.56680 48.91141 [7] 50.13568 50.05124 49.54793 48.29378 46.41971 44.55975[13] 45.02490 44.18506 45.75394

It’s too long – length(Index) = 12

Page 31: Introduction to R - Lecture 4: Looping

Application

This is the same thing – if i = 4 the first time through, and you want something to be saved in position 1 of another vector:

Index = 4:15mean_wt <- rep(0, length(Index))for(i in Index) {

mean_wt[(i-3)] = mean(dog_dat[,i])}

Page 32: Introduction to R - Lecture 4: Looping

Application

I think it’s easier to define an index first, and then within the loop use each entry of that index (first way of doing it)

However, feel free to do it any way you want (however it makes the most sense to you)

Page 33: Introduction to R - Lecture 4: Looping

Application

Note: R has several built-in commands that do what we just did:rowSums() , colSums()rowMeans(), colMeans()

We basically just did this using a loop: colMeans(dog_dat[,4:15])

Page 34: Introduction to R - Lecture 4: Looping

Overview

Practice Review The ‘for’ loop

RationaleSyntaxApplicationGetting creative…

Page 35: Introduction to R - Lecture 4: Looping

Creative

We still have two problems to solve:Average of food, weight, and length at each

visitAnd then those averages for each dog type at

each visit

Page 36: Introduction to R - Lecture 4: Looping

CreativeIndex = 16:27mean_len <- rep(0, length(Index))for(i in 1:length(Index)) {

ind = Index[i]mean_len[i] = mean(dog_dat[,ind])

}

Index = 28:39mean_food <- rep(0, length(Index))for(i in 1:length(Index)) {

ind = Index[i]mean_food[i] = mean(dog_dat[,ind])

}

Page 37: Introduction to R - Lecture 4: Looping

Creative> dog_means = rbind(mean_wt, mean_len, mean_food)> colnames(dog_means) = paste("month",1:12,sep="_") > dog_means month_1 month_2 month_3 month_4 month_5mean_wt 49.69606 48.56680 48.91141 50.13568 50.05124mean_len 20.32427 20.57220 20.68838 20.89668 20.98050mean_food 30.01660 29.74834 28.75415 28.18942 29.50207 month_6 month_7 month_8 month_9 month_10mean_wt 49.54793 48.29378 46.41971 44.55975 45.02490mean_len 21.26950 21.37178 21.50705 21.61141 21.80975mean_food 30.22573 30.88050 29.18942 30.01079 29.87033 month_11 month_12mean_wt 44.18506 45.75394mean_len 21.97842 22.27822mean_food 29.51784 30.87614

Page 38: Introduction to R - Lecture 4: Looping

Creative

paste: concatenates vectors after converting to character – its great for creating names within for loops, or of new matrices> paste("letter",c("a","b","c"), sep=":")[1] "letter:a" "letter:b" "letter:c"> x = c("a", "b", "c")> paste("letter",x, sep=":")[1] "letter:a" "letter:b" "letter:c"

Page 39: Introduction to R - Lecture 4: Looping

Creative

Index = 4:15mean_wt <- rep(0, length(Index))lab = rep(0, length(Index))for(i in 1:length(Index)) {

ind = Index[i]mean_wt[i] = mean(dog_dat[,ind])lab[i] = paste("the ", i, "th entry is ",

round(mean_wt[i],2),sep="")}> head(lab)[1] "the 1th entry is 49.7" "the 2th entry is 48.57"[3] "the 3th entry is 48.91" "the 4th entry is 50.14"[5] "the 5th entry is 50.05" "the 6th entry is 49.55"

Page 40: Introduction to R - Lecture 4: Looping

Creative

Now we get to solve #2 (using ‘for’ loops and colMeans) and store it in 3 matrices

First, make blank matrices Then create for loops over our variables of

interest

Page 41: Introduction to R - Lecture 4: Looping

Creativedogs = unique(dog_dat$dog_type)wt = matrix(nrow = length(dogs), ncol = 12)for(i in 1:length(dogs)) { # 1:4

# for each dog type...Index = which(dog_dat$dog_type == dogs[i])

# specific weights for each dog typetmp = dog_dat[Index,4:15]

# each row is for one dogwt[i,] = colMeans(tmp)

}

Page 42: Introduction to R - Lecture 4: Looping

Creative

> rownames(wt) = dogs> colnames(wt) = paste("month",1:12,sep="_") > wt month_1 month_2 month_3 month_4 month_5 month_6lab 49.81840 48.69200 49.03360 50.26560 50.17600 49.67280poodle 49.40090 48.27297 48.61892 49.84414 49.76126 49.25856husky 49.26372 48.13097 48.48142 49.70088 49.61858 49.11327retriever 50.19474 49.06466 49.40602 50.62632 50.54361 50.04135 month_7 month_8 month_9 month_10 month_11 month_12lab 48.41600 46.54640 44.68640 45.15040 44.30640 45.88240poodle 47.99820 46.12613 44.26577 44.73243 43.89009 45.46306husky 47.86195 45.98761 44.12832 44.59469 43.75221 45.31858retriever 48.79248 46.91278 45.05263 45.51654 44.68496 46.24586

Page 43: Introduction to R - Lecture 4: Looping

Creative# same thing for length...len = matrix(nrow = length(dogs), ncol = 12)rownames(len) = dogscolnames(len) = paste("month",1:12,sep="_") for(i in 1:length(dogs)) {

tmp = dog_dat[dog_dat$dog_type == dogs[i],16:27]len[i,] = colMeans(tmp)

}

# and for food.food = matrix(nrow = length(dogs), ncol = 12)rownames(food) = dogscolnames(food) = paste("month",1:12,sep="_") for(i in 1:length(dogs)) {

tmp = dog_dat[dog_dat$dog_type == dogs[i],28:39]food[i,] = colMeans(tmp)

}

Page 44: Introduction to R - Lecture 4: Looping

Creative

Note that the code for each category (weight, length, and food) is still quite similar

Next week, double ‘for’ loops and lists