r: packages & data. presented here are a number of ways to accomplish a task, some are redundant...

29
R: Packages & Data

Upload: lee-pope

Post on 02-Jan-2016

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

R: Packages & Data

Page 2: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Presented here are a number of ways to accomplish a task, some

are redundant or may not represent the best way to

accomplish a task. However, some “quick & dirty” commands are useful to know for when all

the “better” options aren’t working

Page 3: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

R Packages

• What is an R package?– A series of programs bundled together

• Once installed a copy of the package lives on the computer and doesn’t need to be reinstalled

• Updating R– Must reinstall packages– May loose packages that aren’t kept updated

Page 4: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Packages-> Install Package

Page 5: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Choose a Mirror Site

Page 6: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Choose Package

Page 7: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Loading Package/Contents

• To load a package– library(package name)

• Contents of package– library(help= package name)

• For additional documentation– http://cran.r-project.org/

• PackagesPackage Name Downloads: Reference Manual

• Note: Some packages may overwrite the contents or functions in another package, when this happens it will be indicated in the log

Page 8: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Advanced: Loading Packages

• To find out what packages are already installed on a computer– installed.packages()

• To check if a given package is installed– is.installed <- function(mypkg) is.element(mypkg,

installed.packages()[,1])

• To install a package without clicking through windows– Install.package(“Package Name”)

• These last two commands are particularly helpful when writing functions for other users

Page 9: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Functions within a Package

• To get help– ?FunctionName– ??Topic of Interest

• To see the source code– Function Name

• To see an example– example(Function Name)

Page 10: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Getting Started: Loading Files

• help(topic)• ?topic• help.search(“topic”)• ??topic• str()• ls()• dir()• history()

• library()• library(help=)• rm()• rm(list=ls())• example()• setwd()• source()• function

Page 11: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Data Manipulation: Data Entry

• Types of Data– Numerical, categorical, logical, factors– mode(variable)

• Formats of Data– Scalar, vector/array, matrix, data frame, list

• Ways to enter data– Manually– read.csv,read.table,scan– library(foreign)– library(Hmisc)

Page 12: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Importing from SAS

• Option One:– In SAS

proc export DATA=fileDBMS=CSVOUTFILE=“destination\name.csv"; run;

– In R• read.csv()

Page 13: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

read.csv()• Syntax

– read.csv(file, header = TRUE, sep = ",“, dec=".", fill = TRUE,...) • File: the name of the file which the data are to be read from. Each row of the table

appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). File can also be a complete URL.

• Header: a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns.

• Sep: the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.

• Dec: the character used in the file for decimal points.• fill :logical. If TRUE then in case the rows have unequal length, blank fields are

implicitly added. See ‘Details’.• Additional Options available, see documentation• Note: If you’re desperate to read in an unusual data type see “scan”

Page 14: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

.RData

• The extension .RData is a way to store objects created in R.

• Store using the command save(c(object1, object2),file=“Storage.RData”)

• Access later using load( “Storage.RData”)

Page 15: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Advanced: Reading Data directly from SAS or STATA

• SAS Option Two:– In SAS

• libname library xport =“destination\name.xpt";• data library.data;• set data;• run;

– In R• library(Hmisc)• data<-sasexport.get(“destination\name.xpt“)

• STATA– library(foreign)

• NOTE: THE PACKAGE FOREGIN CAN HANDLE MULTIPLE FILE TYPES INCLUDING SAS

– data.stata<-read.dta(“file.dta")

Page 16: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Data Entry

• c(…)• seq(from,to)• rep(x,times)• data.frame()• list()• matrix()

• read.dta()• sasxport.get()• read.csv()• data()• data(R DataSet)• help(R DataSet)• load()

Page 17: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Data Information

• mode()• is.character()• is.numeric()• is.logical()• is.factor()• class()• is.matrix()• is.data.frame()• names()• head()• tail()

• length()• dim()• nrow()• ncol()• is.na()• dimnames()• rownames()• colnames()• unique()• describe()• levels()

Page 18: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Data Manipulation

• It is possible to access subsets of a data item using bracketed commands. (e.g. x[n] )

• Options to do this includes the everything but command (x[-n]), multiple selections (x[1:n] or x(c(1,2,3)])

• Logical Arguments can also be used (x[x > 3 & x < 5])• Lists use a double bracketing structure ( x[[n]] )• Data frame items can be called using two formats

– x[[“name”]]– x$name

• Anything with row and column data uses a double structure to index (x[ i , j ])

Page 19: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Data Manipulation

• as.numeric()• as.logical()• as.character()• as.array()• as.data.frame()• as.matrix()• factor()• ordered()• t()• reshape()

• cat()• rbind()• cbind()• merge()• sort()• order()• library(reshape)• rownames()<-c()• colnames()<-c()• na.omit()• cut()

Page 20: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Character & Time Based Data

• nchar()• substr()• tolower()• toupper()• chartr()• grep()• match()• %in%• pmatch()• charmatch()• sub()• strsplit()• paste()

• Sys.time()• Sys.Date()• date()• as.Date• as.POSIXct()

Page 21: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Symbol Meaning

%d Day as a number (01-31)

%a Abbreviated Weekday (Mon)

%H Hours as decimal number (00-23)

%I Hours as decimal number (01-12)

%w Weekday as decimal number (0–6, Sunday is 0).

%W Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.

%x Date, locale-specific.

%X Time, locale-specific.

%z Time zone

%j Days of year as decimal number (001-366)

%M Minute as decimal number (00-59)

%p AM/PM indicator in the lcoale (Used in conjunction with %I and not with %H)

%S Second as decimal number (00-61), allowing for up to two leap-seconds

%U Week of the year as a decimal (00-53), using Sunday as the first day 1 of the week

%A Unabbreviated Weekday (Monday)

%c Date and time, locale-specific.

%m Month (2)

%b Abbreviated Month (Feb)

%B Unabbreviated Month (February)

%y Two-Digit Year (11)

%Y Four-Digit Year(2011)

Page 22: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Data Export

• ftable()• format()• paste()• xtable()• write.table(data,"clipb

oard",sep="\t",col.names=NA)

• write.csv()• write.foreign()• write.dta• sink()• save()• print()• save.image()

Page 23: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

format()

• Syntax– format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify =

c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = ".", zero.print = NULL, drop0trailing = FALSE, ...)

– X: any R object– Trim: logical, if FALSE numbers are right-justified to a common

width, If TRUE the leading blacks for justification are suppressed.

– Digits: how many significant digits should be used. – justify: character, vector should be left-justified, right-justified, or

centered. – See also

• format.Date,(methods for dates)• format.POSIXct (date-times)

Page 24: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

Extra Resources

Page 25: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

• Advanced Packages to try– gtools– reshape

• Journal of Statistical Computing– http://stat-computing.org/

• Journal of Statistical Software– http://www.jstatsoft.org/

Page 26: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

http://journal.r-project.org/

Page 27: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

www.rseek.org

Page 28: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

http://r-forge.r-project.org/

Page 29: R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task

http://www.statmethods.net/index.html