Transcript
Page 1: the data pirate’s R-cheatsheet - Squarespacecheat+sheet.pdfthe data pirate’s R-cheatsheet names() Example names(x)

By Danny Blakerdata analyst at www.leanstartup.chat

blogger www.dannyblaker.com

Basic Operators+ add- subtract* multiply/ divide^ power of%% Modulo& and

Variablesx <- assigns a variable to “x”x print a variable to consolex + y add 2 variables togetherz <- x + y add 2 variables together and store them in a new variable (z).

Classes3 main classes:12 Numeric (number)“hello” StringTRUE Logical (True or False)Class(x) returns class of xunclass(x) returns argument without class

Vectors

Libraries & Packages

HELP

?my_function type “?” before any function in the console to access documentationargs(function) List arguments in a functiontypeof() Shows vector typelength() Shows vector lengthrange() Shows vector rangeprint(x) prints xoptions() set global options

x <- 23x

Example== equals!= does not equal> greater than< less than>= greater than or equal to<= less than or equal to| or

x <- c(23,54)y <- c(12,14)z <- x + y

c() Example

x <- c(1,2,3) creates a vector “x” containing numbers 1, 2, and 3c() combines objects / elementsvectors can be added, subtracted, multiplied and divided. Result can be stored in a new variable.names() sets the names for each element in a vector[] selects an element of a vectorvector_1 > vector_2 checks if each element in vector_1 is greater than the corresponding element in vector_2

[] Example: x[c(3:6)]selects elements 3 to 6 of vector

Data.frames

Matrices

install.package(”package_name”) installs any package you specifylibrary(”package_name”) loads a package into worspaceargs(function) List arguments in a functionsearch() search packages currently attached

3 main classes:matrix(1:9, byrow = TRUE, nrow = 3) creates a matrix containing no.s 1-9 accross 3 rowscolnames() assigns column namesrownames() assigns row namesdimnames = list(c(”row name”), c(”column names”)) is another way to assign column and row namesrbind() combines data by rowscbind() combines data by columnscolSums() sum of matrix columnsrowSums() sum of matrix rowsx[1,1] selects row 1, column 1 reference from matix “x”entire matrices can be multiplied or divided like a regular vector

RGHthe data pirate’sR-cheatsheet

names() Examplenames(x) <- y

FactorsFactors are weighted / ordered observations or variablesfactor(x) makes x a factorfactor(x, ordered = FALSE) makes x a non-weighted factorfactor(x, ordered = TRUE, levels = c("1st", "2nd", "3rd")) makes x into a factor with ordered levels: 1st, 2nd and 3rdlevels(x) <- c("1", "2") assigns factor x with levels “1” and “2”

str(x) structure of variable “x” or data setdim(x) Quickly shows number of observations & variablesnames() Shows top level names of list or datasetsummary(x) Instant summary of “x”head(x) Shows start of dataset “x”tail(x) Shows end of dataset “x”str(head(x) Shows structure of start of dataset “x”str(tail(x) Shows structure of end of dataset “x”subset(x, subset = column_1 > 1) creates a subset of data frame “x” with all entries where “column_1” is greater than 1order(x) sorts dataset xlist() creates a list$ selects a column of a dataframeappend(x,y) appends vectors x and ysort(x, decreasing = FALSE, ...) sorts x

data.frame() creates a data frame

items <- c(“parrot”,”sword”)islands <- c(”skull island”,”treasure caverns”)pirate_brochure <- data.frame(items, islands)creates a 2x2 data frame stored in “pirate_brochure”

data.frame Example

pirate_brochure[] selects elements of data frame “pirate_brochure”

pirate_brochure[1, ] selects row 1pirate_brochure[ ,1] selects column 1pirate_brochure[1,1] selects observation row 1 col 1pirate_brochure[1:2, “items”] selects 1st and second observations in column “items”

data frame element selection Exampledf$names selects the “names” column of dataframe “df”

dbConnect exampledbConnect(RMySQL::MySQL(), dbname = "db", host = "db.amazonaws.com", port = 0000, user = "test", password = "1234")

Basic Queriesmean() averagesum() sumabs() absolute valuesd() standard deviationsqrt() square rootnorm() norm of matrixmedian() median valuednorm() density normal distributionpnorm() distribution fucntion for normal distributionqnorm() quantile normal distributionrnorm() random normal distributionstrsplit() split stringsidentical() check if value is identicalcat() combines and printspaste0() converts to strings and concatenateslm() fits linear modelsplit() divide into groups and reassemble

If statementif (condition1) { expr1} else if (condition2) { expr2} else if (condition3) { expr3} else { expr4}

While loop while (condition) { expr}

For loop for(var in seq) { expr}

break exits loopnext skips specified loop iteration

Function syntaxmy_fun <- function(arg1, arg2) { body}

Functions

return(x) returns xis.na() counts how many elements are missingwarning(..., call. = FALSE) warning messagestop(..., call. = TRUE stops executionmessage() diagnostic messageany() is atleast one value truelapply(X, FUN, ...) iterates over x with a function and returns a listsapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) iterates over x with a function and returns a vectorvapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) iterates over x with a function and returns aspecified outputreplicate(n, expr, simplify = "array") sapply for repeated evaluationmapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) multivariate version of sapply

library examplelibrary(ggplot2)qplot(pirates$swords, pirates$parrots)creates a plot with swords & parrots columns of “pirates” dataframe

grep() returns a vector of indices of the character strings that contains a specificpattern.grepl() returns TRUE when a pattern is found in acorresponding character string

Regular Expressions

sub() search for and replace (first only)gsub() search for and replace (ALL)^ match content at start of string$ match content at start of string.* match any character zero or more times\\ escapes a character (eg. “.”)

Regex syntax examplepirates <- c("[email protected]", "[email protected]")replace all “pirateparrot”s with “piratesword”ssub("@.*\\.com$","@piratesword.com", pirates)

Dates & TimesSys.Date() current dateSys.time() current time%Y 4-digit year (2016)%y 2-digit year (16)%m 2-digit month (01)%d 2-digit day of the month (20)%A weekday (Monday)

%a: abbreviated weekday (Mon)%B: month (September)%b: abbreviated month (Sep)

Graphsplot(x, y, ...) basic scatter plothist() histogramboxplot() box plotdensity() kernel density plotdotchart() dot plotbarplot() bar plotlines() line chart pie() pie chart

list1[[x]] returns element x in list1 list1[[x]][[1]] returns the first element inside the element called x in list1list1[[x]][[1]][[2]] returns the second element inside the first element inside the element called x in list1

List Subsetting

map(.x, .f, ...) apply a function to each element of a vectorpmap(.l, .f, ...) map over multiple inputs %>% pipes: “x %>% f(y)” is the same as “f(x, y)”

purrr

Importing Data

filter(.data, ...) filter db rows by matching condition (requires dplyr package)

Utils

read.table() read.csv()

read.delim()

readr

read_delim() read_csv()read_tsv()

skip skips rows from beginningn_max maximum rows to importfread() fast import (requires dtplyr package)

excel_sheets() prints sheet namesread_excel() import data from spreadsheet read.xls() import from .xls (requires gdata package)loadWorkbook() import workbook (requires XLConnect package)getSheets() read sheets (XLConnect)readWorksheet() import sheets (XLConnect)

read_sas(”file_name.sas7bdat) imports SAS file (requires haven package)

DBI Base packageRMySQL MySQLROracle OracleRPostgresSQL PostgresSQL

Importing Data PACKAGES dbConnect connects to database

Top Related