reading data into r

Post on 10-May-2015

1.861 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reading data into

2012-09-28 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO(KNOW

! Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH

! Introduction to R

Previously in this group

Menu

! What statistics is all about.

! Data-reading functions in R

! Installing packages

! Reading excel files

! Reading other files

is the study of the collection, organization, analysis, interpretation,

and presentation of

datahttp://en.wikipedia.org/wiki/Statistics

http://mediacrushllc.com/2012/internet-statistics-2012/

No data,No life

No statistics

http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html

Loading data is the first step

Supported! .RData (native): load()

! .csv: read.csv()

! .xls/.xlsx: library(gdata) or library(XLConnect)

! .sas7bdat: read.sas7bdat() via library(sas7bdat)

! .dta: read.dta via library(foreign)

! and more...http://cran.r-project.org/doc/manuals/R-data.html

library()packages

http://r4stats.com/articles/popularity/

4000+user-

contributedpackages

Fast development

Downside:not much can be

done withoutpackages

CRAN

Let’s try

Open R Studio

http://rstudio.org

Watch the screencast

SourceConsole

Plot Workspace

switched

Menu: RStudio - Preferences

SourceConsole

My configuration

Plot Workspace

Menu: RStudio - Preferences My configuration

Configure CRAN mirror

Use .CSVif possible

http://www.edrugsearch.com/edsblog/cvs-takes-on-wal-marts-generic-drug-prices-with-a-gimmicky-twist/#.UEfft0J8z0d

Comma Separated Values

read.csv(“file.csv”)

http://www.wondergraphs.com/img/SFO_Landings.csv

Careful big file!

new.dat <- read.csv(“file.csv”)

name of a dataset here

file name herefunction to read .csv files

new.dat <- read.csv(file.choose())

name of a dataset here

function to open a file-choose dialoguefunction to read .csv files

alternatively

Space separated

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat

read.table(“file.dat”)or

read.table(“file.dat”, header = T)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat

tab-separated

For comma-, tab-, or space-separated text

Let’s try!

install.packages(“gdata”, dep = T)

library(gdata)read.xls(“file.xls”)

Perl configuration necessary on Winhttp://cran.r-project.org/web/packages/gdata/INSTALL

install.packages(“XLConnect”, dep = T)

library(XLConnect)readWorksheet(loadWorkbook(“file.xls”),

sheet=1)

install.packages("XLConnect", type = "source") on Mac

Define a function for simplicitymy.read.xls <- function(file) readWorksheet(loadWorkbook(file), sheet = 1)my.read.xls(“file.xls”)

install.packages(“package”, dep = T)

package name here

To install a package

short for dependenciesshort for TRUE

To load a package

library(package)

package name here

double quote “” can be omitted

Just click box

Install packageLoad package

Read xls file chosen to nhefs

install.packages(“sas7bdat”, dep = T)

library(sas7bdat)read.sas7bdat(“file.sas7bdat”)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/smoking.sas7bdat

install.packages(“XML”, dep = T)library(XML)

readHTMLTable("http://www.drugs.com/top200_2003.html", which = 2, skip.rows = 1)

http://www.drugs.com/top200_2003.html

Fixed width

read.fwf(“file.txt”, width = c(3, 5, ...))

Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject

Important functions

! install.packages(“PackageName”, dep = T)

! library(PackageName)

! str(dataset)

! summary(dataset)

! head(dataset)

Appendix:Probability Functions

-norm -t -binom -pois what it does

d- dnorm dt dbinom dpois

density (mass)

given x-axis

p- pnorm pt pbinom ppois

return probability,

given x- axis(quan.)

q- qnorm qt qbinom qpois

return quantile (x-axis),

given prob.

-testlibrary(BS

DA): z.test,

zsum.test

t.test, library(BS

DA): tsum.test

binom.test poisson.test

return p-value and confidence

interval

top related