20130215 reading data into r

41
Reading and Manipulationg data in 2013-02-15 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO KNOW

Upload: kazuki-yoshida

Post on 26-May-2015

382 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 20130215 Reading data into R

Reading and Manipulationg

data in2013-02-15 @HSPH

Kazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO  KNOW

Page 2: 20130215 Reading data into R

Reading data in

n Usually the first task in real-life data analysis.

Page 3: 20130215 Reading data into R

Supportedn .RData (native) files: load()

n .csv files: read.csv()

n .xls/.xlsx files: gdata::read.xls() or xlsx::read.xlsx()

n .sas7bdat files: sas7bdat ::read.sas7bdat()

n .dta files: foreign::read.dta()

n and more...http://cran.r-project.org/doc/manuals/R-data.html

Page 4: 20130215 Reading data into R

foreign::read.dta()

package name(packages add functions) function name

functions are followed by (),in which you specify arguments

Page 5: 20130215 Reading data into R

Create a folder for this group

Page 6: 20130215 Reading data into R

Open R Studio

Page 7: 20130215 Reading data into R

Make sure your working directory

is correct

Page 10: 20130215 Reading data into R

For comma-, tab-, or space-separated text

Page 11: 20130215 Reading data into R

new.dat <- read.csv(“file.csv”)

name of object to create

file name herefunction to read .csv files

assignment operator

Page 12: 20130215 Reading data into R

Space separated

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat

Page 13: 20130215 Reading data into R

read.table(“file.dat”)or

read.table(“file.dat”, header = T)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat

Page 14: 20130215 Reading data into R

tab-separated

Page 16: 20130215 Reading data into R

Excel files

Page 17: 20130215 Reading data into R

Install xlsx package

Page 18: 20130215 Reading data into R

Just click box to load

Page 19: 20130215 Reading data into R

To install/load a package

install.packages(“package”, dep = T)

library(package)

Page 20: 20130215 Reading data into R

xlsdat <- read.xlsx(“file.xls”, 1)

name of object to create

file name herefunction to read .xlsx files

assignment operator

sheet number

Page 21: 20130215 Reading data into R
Page 22: 20130215 Reading data into R

library(sas7bdat)sasdat <- read.sas7bdat(“file.sas7bdat”)

SAS native files

Page 24: 20130215 Reading data into R
Page 26: 20130215 Reading data into R

Fixed width

Page 27: 20130215 Reading data into R

fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...))

Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject

Page 28: 20130215 Reading data into R

Manipulating data in R

n Objects

n Classes

n Various data objects

Page 29: 20130215 Reading data into R

Objects

n Just about everything named in R is an object

n An object is a container that

n knows its class (eg, I have numbers inside!).

n has contents (eg, Actual numbers).

Page 30: 20130215 Reading data into R

Examples of objects

n data, which you use for analysis (various classes)

n functions, which perform analysis (function class)

n results, which come out of analysis (various classes)

Page 31: 20130215 Reading data into R

Classes of data values inside data objects

n Numeric: Continuous variables

n Factor: Categorical variables

n Logical: TRUE/FALSE binary variables

n etc...

Page 32: 20130215 Reading data into R

Class?

n An object’s class tells R how the object should be handled.

n For example, summarizing data should work differently for numbers and categories!

Page 33: 20130215 Reading data into R

Data objects

n Vector (contains single class of data values)

n List (contains multiple classes of data values)

Page 34: 20130215 Reading data into R

Data objects

n Vector (contains single class of data values)

n Array including Matrix

n List (contains multiple classes of data values)

n Data frame

Page 35: 20130215 Reading data into R

Vector

n Smallest building block of data objects

n Single dimension

n Combination of values of same class

n vec1 <- c(2013, 2, 15, -10) # combine

n vec2 <- 1:16 # integers 1 to 16

Page 36: 20130215 Reading data into R

Arrayn Vector folded into a multidimensional structure

n 2-dimensional array is a matrix

n vec3 <- 1:16

n dim(vec3) <- c(4, 4) # 4 x 4 structure

n dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure

n arr1 <- array(1:60, dim = c(3,4,5))

Page 37: 20130215 Reading data into R

List

n Combination of any values or objects

n Can contain objects of multiple classes

n eg, a list of two vectors, a matrix, three arrays

n list1 <- list(first = 1:17, second = matrix(letters, 13,2))

n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))

Page 38: 20130215 Reading data into R

Data frame

n Special case of a list

n List of same-length vectors vertically aligned

n df1 <- data.frame(list2)

n list3 <- list(small = letters, large = LETTERS, number = 1:26)

n df2 <- data.frame(list3)

Page 39: 20130215 Reading data into R

Access by indexes

n letters[3] # 1-dimensional object

n arr1[1,2,3] # 3-dimensional object

n arr1[1, ,3] # implies 1,(all),3

n df1[ ,3] # implies (all),3

n list1[[1]] # list needs [[ ]]

Page 40: 20130215 Reading data into R

Access named elements

n list3

n list3$small

n list3[["small"]]

n df1$large

n df1[, "large"]

Page 41: 20130215 Reading data into R