r: packages & data. presented here are a number of ways to accomplish a task, some are redundant...
TRANSCRIPT
R: Packages & Data
Presented here are a number of ways to accomplish a task, some
are redundant or may not represent the best way to
accomplish a task. However, some “quick & dirty” commands are useful to know for when all
the “better” options aren’t working
R Packages
• What is an R package?– A series of programs bundled together
• Once installed a copy of the package lives on the computer and doesn’t need to be reinstalled
• Updating R– Must reinstall packages– May loose packages that aren’t kept updated
Packages-> Install Package
Choose a Mirror Site
Choose Package
Loading Package/Contents
• To load a package– library(package name)
• Contents of package– library(help= package name)
• For additional documentation– http://cran.r-project.org/
• PackagesPackage Name Downloads: Reference Manual
• Note: Some packages may overwrite the contents or functions in another package, when this happens it will be indicated in the log
Advanced: Loading Packages
• To find out what packages are already installed on a computer– installed.packages()
• To check if a given package is installed– is.installed <- function(mypkg) is.element(mypkg,
installed.packages()[,1])
• To install a package without clicking through windows– Install.package(“Package Name”)
• These last two commands are particularly helpful when writing functions for other users
Functions within a Package
• To get help– ?FunctionName– ??Topic of Interest
• To see the source code– Function Name
• To see an example– example(Function Name)
Getting Started: Loading Files
• help(topic)• ?topic• help.search(“topic”)• ??topic• str()• ls()• dir()• history()
• library()• library(help=)• rm()• rm(list=ls())• example()• setwd()• source()• function
Data Manipulation: Data Entry
• Types of Data– Numerical, categorical, logical, factors– mode(variable)
• Formats of Data– Scalar, vector/array, matrix, data frame, list
• Ways to enter data– Manually– read.csv,read.table,scan– library(foreign)– library(Hmisc)
Importing from SAS
• Option One:– In SAS
proc export DATA=fileDBMS=CSVOUTFILE=“destination\name.csv"; run;
– In R• read.csv()
read.csv()• Syntax
– read.csv(file, header = TRUE, sep = ",“, dec=".", fill = TRUE,...) • File: the name of the file which the data are to be read from. Each row of the table
appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). File can also be a complete URL.
• Header: a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns.
• Sep: the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.
• Dec: the character used in the file for decimal points.• fill :logical. If TRUE then in case the rows have unequal length, blank fields are
implicitly added. See ‘Details’.• Additional Options available, see documentation• Note: If you’re desperate to read in an unusual data type see “scan”
.RData
• The extension .RData is a way to store objects created in R.
• Store using the command save(c(object1, object2),file=“Storage.RData”)
• Access later using load( “Storage.RData”)
Advanced: Reading Data directly from SAS or STATA
• SAS Option Two:– In SAS
• libname library xport =“destination\name.xpt";• data library.data;• set data;• run;
– In R• library(Hmisc)• data<-sasexport.get(“destination\name.xpt“)
• STATA– library(foreign)
• NOTE: THE PACKAGE FOREGIN CAN HANDLE MULTIPLE FILE TYPES INCLUDING SAS
– data.stata<-read.dta(“file.dta")
Data Entry
• c(…)• seq(from,to)• rep(x,times)• data.frame()• list()• matrix()
• read.dta()• sasxport.get()• read.csv()• data()• data(R DataSet)• help(R DataSet)• load()
Data Information
• mode()• is.character()• is.numeric()• is.logical()• is.factor()• class()• is.matrix()• is.data.frame()• names()• head()• tail()
• length()• dim()• nrow()• ncol()• is.na()• dimnames()• rownames()• colnames()• unique()• describe()• levels()
Data Manipulation
• It is possible to access subsets of a data item using bracketed commands. (e.g. x[n] )
• Options to do this includes the everything but command (x[-n]), multiple selections (x[1:n] or x(c(1,2,3)])
• Logical Arguments can also be used (x[x > 3 & x < 5])• Lists use a double bracketing structure ( x[[n]] )• Data frame items can be called using two formats
– x[[“name”]]– x$name
• Anything with row and column data uses a double structure to index (x[ i , j ])
Data Manipulation
• as.numeric()• as.logical()• as.character()• as.array()• as.data.frame()• as.matrix()• factor()• ordered()• t()• reshape()
• cat()• rbind()• cbind()• merge()• sort()• order()• library(reshape)• rownames()<-c()• colnames()<-c()• na.omit()• cut()
Character & Time Based Data
• nchar()• substr()• tolower()• toupper()• chartr()• grep()• match()• %in%• pmatch()• charmatch()• sub()• strsplit()• paste()
• Sys.time()• Sys.Date()• date()• as.Date• as.POSIXct()
Symbol Meaning
%d Day as a number (01-31)
%a Abbreviated Weekday (Mon)
%H Hours as decimal number (00-23)
%I Hours as decimal number (01-12)
%w Weekday as decimal number (0–6, Sunday is 0).
%W Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
%x Date, locale-specific.
%X Time, locale-specific.
%z Time zone
%j Days of year as decimal number (001-366)
%M Minute as decimal number (00-59)
%p AM/PM indicator in the lcoale (Used in conjunction with %I and not with %H)
%S Second as decimal number (00-61), allowing for up to two leap-seconds
%U Week of the year as a decimal (00-53), using Sunday as the first day 1 of the week
%A Unabbreviated Weekday (Monday)
%c Date and time, locale-specific.
%m Month (2)
%b Abbreviated Month (Feb)
%B Unabbreviated Month (February)
%y Two-Digit Year (11)
%Y Four-Digit Year(2011)
Data Export
• ftable()• format()• paste()• xtable()• write.table(data,"clipb
oard",sep="\t",col.names=NA)
• write.csv()• write.foreign()• write.dta• sink()• save()• print()• save.image()
format()
• Syntax– format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify =
c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = ".", zero.print = NULL, drop0trailing = FALSE, ...)
– X: any R object– Trim: logical, if FALSE numbers are right-justified to a common
width, If TRUE the leading blacks for justification are suppressed.
– Digits: how many significant digits should be used. – justify: character, vector should be left-justified, right-justified, or
centered. – See also
• format.Date,(methods for dates)• format.POSIXct (date-times)
Extra Resources
• Advanced Packages to try– gtools– reshape
• Journal of Statistical Computing– http://stat-computing.org/
• Journal of Statistical Software– http://www.jstatsoft.org/
http://journal.r-project.org/
www.rseek.org
http://r-forge.r-project.org/
http://www.statmethods.net/index.html