r - get started i - sanaitics

1

GET STARTED IN RModule I

2

What is R?• A language and environment for statistical

computing and graphics

• Wide variety of statistical & graphical techniques built in

• Free and Open Source software

• Compiles and runs on a wide variety of UNIX platforms, Windows and MacOS

3

R Environment• Most functionality through built in functions

• Basic functions available by default

• Other functions contained in packages that can be attached

• All datasets created in the session are in Memory

• Output can be used as input to other functions

4

R-CRAN• The Comprehensive R Archive Network

• A network of global web servers storing identical, up-to-date, versions of code and documentation for R

• Use the CRAN mirror nearest to you to download R setup at a faster speed

• http://cran.r-project.org/

5

SECTION 1Introduction to R and User Interface (UI)

6

The R-UI

The R console-- Type in the command -- Press enter-- See the output

7

Create Your Data Setx<-c(12,23,45)y<-c(13,21,6) Create vectors x, y, z on R

Consolez<-c("a","b","c")data1<-data.frame(x,y,z) Combine them in a

tabledata1 Type data name for output

8

Your data in Table mode

edit(data1)

Edit your table, add new values & close the window to save your updates

9

Export your updated table write.csv (data1,file.choose())

Select the path, name your file and click save

10

Data Set typesVector 1 dimension

All elements have the same data types

NumericCharacterLogicFactor

Matrix 2 dimensions

Array 2 or moredimensions

Data frame 2 dimensionsTable-like data object allowing different data types for different columns

ListCollection of data objects, each element of a list is a data object

11

Useful In-Built Functionsmin(data1$x)Use $ sign after data set name to use specific columns

max(data1$y) length(data1$x)

mean(data1$y)

levels(data1$z) Gives a list of unique categories in the data

12

Save Your ScriptOpen a new script, write commands, execute using F5 key. Save the file at a desired folder. Helps to save the script for a later use.

13

R Workspace• Objects that you create during an R session are

hold in memory, the collection of objects that you currently have is called the workspace

• This workspace is not saved on disk unless you tell R to do so

• This means that your objects are lost when you close R and not save the objects, or worse when R or your system crashes on you during a session

14

R Workspace• If you have saved a workspace image and

you start R the next time, it will restore the workspace

• So all your previously saved objects are available again

• You can also explicitly load a saved workspace i.e., that could be the workspace image of someone else

• Go the `File' menu and select `Load workspace'

15

R WorkspaceDisplay last 25 commandshistory()

Display all previous commandshistory(max.show=Inf)

Save your command history to a file. Default is ".Rhistory"savehistory(file="myfile")

Recall your command history. Default is ".Rhistory"loadhistory(file="myfile")

16

Types of Variables• Type: Logical, integer, double, complex,

character, factor or raw

• Type conversion functions have the naming convention

as.xxx

• For example, as.integer(3.2) returns the integer 3, and as.character(3.2) returns the string "3.2".

17

Factors• Tell R that a variable is nominal by making it

a factor

• The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers

• An ordered factor is used to represent an ordinal variable

18

Factors• R will treat factors as nominal variables and

ordered factors as ordinal variables in statistical procedures and graphical analyses

• You can use options in the factor( ) and ordered( ) functions to control the mapping of integers to strings (overriding the alphabetical ordering)

19

Help in R•If you know the topic but not the exact

function

Help.search(“topic”)

•If you know the exact function

Help(functionname)

20

In-Memory Computing•Storage of information in the main RAM

of dedicated servers rather than in complicated relational databases operating on comparatively slow disk drives

•Helps business customers, including retailers, banks and utilities, to quickly detect patterns, analyze massive data volumes on the fly, and perform their operations quickly

21

Import .csv Data Filebasic_salary<-read.table(“C:/Users/Documents/R/BASIC SALARY DATA.csv", header=TRUE, sep=",")

• Command requires the file path separated by a /

• header=TRUE if there are row labels

22

Data ImportedUse Head function to get an idea about how your data looks

like

head(basic_salary)

23

Manual Importbasic_salary<-read.table(file.choose(), header=TRUE , sep=",")

Select file of interest and click open to import

24

Checks after Importdim(basic_salary)

str(basic_salary)

names(basic_salary)

25

Importing .txt File#importing a text file having dlm = “blank/space”sample1<-read.table(file.choose(),header=T)sample1

#importing a comma separated text file new<-read.table(file.choose(), header=T, sep=",")new

#another mehod for importing a comma separated text file

new1<-read.csv(file.choose(),header=T)new1

26

SECTION 2Data Management: Creating Subsets

27

Using Indices•Row level sub setting: display some

selected rows •Sub setting by putting condition on

observations•Column level sub setting: display some

selected columns •Sub setting by putting condition on

variable names•Display selected rows for selected

columns•Sub setting by putting conditions on

variables and observations

28

Row Sub-Settingbasic_salary[c(5:10), ] Display rows from 5th to 10th

29

Row Sub-Settingbasic_salary[c(1,3,5,8), ] Display only selected rows

30

Condition on Observationsbasic_salary1 <- basic_salary[basic_salary$Location ==“DELHI” & basic_salary$ba > 20000, ]

head(basic_salary1)

31

More than one value of Variablebasic_salary2 <-basic_salary[basic_salary$Location %in% c("MUMBAI"),]

head(basic_salary2)

32

Column Sub-Settingbasic_salary3<-basic_salary[ , 1:2]

head(basic_salary3)

33

Condition on Variable Namesbasic_salary4 <- basic_salary[ ,c("First_Name","Location", "ba")]

head(basic_salary4)

34

Selected Rows for Selected Columns

basic_salary5<-basic_salary[c(1,5,8,4), c(1,2)]

basic_salary5

35

Condition on Variables & Observations

basic_salary6<-basic_salary[basic_salary$Grade=="GR1" & basic_salary$ba >15000 , c("First_Name","Location", "ba") ]

head(basic_salary6)

36

Subset Function•Condition on observations

•Condition on variable names

•Condition on observations and variable names

37

Condition on Observationsbasic_salary7<-subset(basic_salary, Location=="DELHI" & ba > 20000)

head(basic_salary7)

38

Condition on Variable Namesbasic_salary8<-subset(basic_salary,select=c(Location,First_Name,ba))

head(basic_salary8)

39

Condition on Observations & Variablebasic_salary9<-subset(basic_salary,Grade=="GR1" & ba>15000,select= c(First_Name,Grade, Location))

head(basic_salary9)

40

Using Head Functionhead(basic_salary, n=5)

41

Using Tail Functiontail(basic_salary, n=5)

42

Using Split Functionbasic_salary10<-split(basic_salary,basic_salary$Location)

delhi_salary<-data.frame(basic_salary10[1])

head(delhi_salary)

43

Using Split Functionmumbai_salary<-data.frame(basic_salary10[2])

head(mumbai_salary,5)

44

Delete Casesbasic_salary11<-basic_salary[!(basic_salary$Grade=="GR1") & !(basic_salary$Location=="MUMBAI"), ]

basic_salary11

45

Delete Casesbasic_salary12<-basic_salary[ !(basic_salary$ba>14500), ]

basic_salary12

46

Recoding Based on Quartilesbrk1<-with(basic_salary,quantile(basic_salary$ba,probs<-seq(0,1,0.25)))brk1

basic_salary13<-within(basic_salary,salary_quartile<-cut(basic_salary$ba,breaks=brk1,labels=1:4,include.lowest=TRUE))

47

Recoding Based on Quartileshead(basic_salary13)

48

Recoding Based on Decilesbrk2<-with(basic_salary,quantile(basic_salary$ba, probes<-seq(0,1,0.10)))brk2

basic_salary14<-within(basic_salary,salary_deciles<- cut(basic_salary$ba,breaks=brk2,labels=1:10,include.lowest=TRUE))

49

Recoding Based on Decilestail(basic_salary14)

50

Recoding Based on User Cut Pointsbrk3<-with(basic_salary,c(min(basic_salary$ba), 15000,18000,max(basic_salary$ba)))

brk3

basic_salary15<-within(basic_salary,salary_partition<-cut(basic_salary$ba,breaks=brk3,labels=1:3,include.lowest=TRUE))

51

Recoding Based on User Cut Points basic_salary15[c(20,30),]

52

Remove Variables#using remove.vars function (need to load “gdata”

package)

Removing variable 'Name'Removing variable 'Place‘

basic_salary<-remove.vars(basic_salary,c("Last_Name", "Location"))

names(basic_salary)[1] "X" "First_Name" "Grade" "ba"

53

SECTION 3Sorting & Merging

54

Sorting Dataattach(basic_salary)bs_sorted1<-basic_salary[order(-ba), ]head(bs_sorted1)

55

Sorting by Multiple Variablesbs_sorted2<-basic_salary[order( First_Name, ba) , ] head(bs_sorted2)

56

Merging by VariablesCreate the below two datasetssal_data

bonus_data

57

Outer Joinouterjoin<-

merge(sal_data,bonus_data,by=c("Employee_ID"),all=TRUE)

outerjoin

58

Inner Joininnerjoin<-merge(sal_data,bonus_data,by=("Employee_ID"))

innerjoin

59

Left Joinleftjoin<-merge(sal_data,bonus_data, by=c("Employee_ID"),all.x=TRUE)

leftjoin

60

Right Joinright_join<-merge(sal_data,bonus_data, by="Employee_ID" , all.y=TRUE)

right_join

61

Merging Cases (append)# rbind function

Appending two datasets using rbind function requires both the datasets with exactly the same number of variables with exactly the same names

If datasets do not have the same number of variables , variables can be either dropped or created so both match.

or else

# smartbind function can be used #need to donwload gtools package to use this function

62

R-String Functionsbasic_salary$Location<-substr(basic_salary$Location,1,1)head(basic_salary)

63

Replace Functionbasic_salary$Location<-substr(basic_salary$Location,1,1)head(basic_salary)table(basic_salary$Location)

64

Contd…..

replace Functionreplace() function replaces the values in x with indices given in list by thosegiven in values. If necessary, the values in values are recycled.

65

Deleting missing cases# complete.cases, na.exclude(), na.omit()are for dealing manually with NAs in a dataset

x<-c(12,NA,13,12)y<-c(25,48,NA,NA) data<-data.frame(x,y) data

data1<-na.omit(data) data1

x y1 12 25

data[complete.cases(data),]

x y1 12 25

na.exclude(data) x y1 12 25

66

Thank you !!!!

r - get started i - sanaitics

Education