for beginners · chapter 1 introduction risanexcitinglanguagetostartyourprogrammingjourney....

28
R PROGRAMMING DATAMENTOR FOR BEGINNERS

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

R PROGRAMMING

DATAMENTOR

FOR

BEGINNERS

Page 2: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

R Programming For BeginnersDataMentor

Page 3: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Contents

I Welcome 7

1 Introduction 91.1 What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Things to know before start learning R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Run R on Your Computer 132.1 In Mac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 In Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 In Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Installing RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

II Basics 17

Introduction 19

3 Your First R Program 203.1 Getting Started with R console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Different ways to run R scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Getting help in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Variables and constants 244.1 Atomic vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Constants in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4 R Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Output 295.1 Using variable name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Using print function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3 Using cat function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Operators 30Assignment Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Relational Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.1 Operator Precedence and Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Comments 357.1 Use Comments the Right Way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2

Page 4: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

CONTENTS 3

8 Packages 368.1 List all Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368.2 Loading Local Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378.3 Loading Remote Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

9 Environment 39

III Data Structures 40

Introduction 42

10 Objects 4310.1 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

11 Vectors 45How to Create Vector in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45How to access Elements of a Vector? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46How to modify a vector in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48How to delete a Vector? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

12 Lists 50How to create a list in R programming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50How to access components of a list? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51How to modify a list in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

13 Matrix 55How to create a matrix in R programming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55How to access Elements of a matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57How to modify a matrix in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

14 Data Frame 62Check if a variable is a data frame or not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Functions of data frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62How to create a Data Frame in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63How to access Components of a Data Frame? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63How to modify a Data Frame in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

15 Factors 67How to create a factor in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67How to access compoments of a factor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68How to modify a factor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

IV Control Structures 70

Introduction 72

16 If…else 73if statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73if…else statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73if…else Ladder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

17 ifelse() function 77Syntax of ifelse() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Example: ifelse() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Page 5: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

4 CONTENTS

18 for loop 79Syntax of for loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Flowchart of for loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Example: for loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

19 while loop 81Syntax of while loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Flowchart of while Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Example of while Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

20 break and next 83break statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83next statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

21 repeat loop 86Syntax of repeat loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Flowchart of repeat loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Example: repeat loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

V Functions 88

Introduction 90

22 Functions 91Syntax for Writing Functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Example of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91How to call a function? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Named Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Default Values for Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

23 Return Value 94Syntax of return() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9423.1 Example: return() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Functions without return() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Multiple Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

24 Function Environment & Scope 9724.1 Cascading of environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9824.2 R Programming Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

25 Recursive Function 101Example: Recursive Function in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

26 Infix Operator 103Example: How infix operators work in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103Example: User defined infix operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Predefined infix operators in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

27 Switch Function 105Syntax of switch() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Example: switch() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Example: switch() Function with as String Expression . . . . . . . . . . . . . . . . . . . . . . . . . 106

Page 6: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

CONTENTS 5

VI Object Oriented Programming 107

Introduction 109

28 Object & Class 110S3 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110S4 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111Reference Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111Comparison between S3 vs S4 vs Reference Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

29 S3 Class 112How to define S3 class and create S3 objects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112How to use constructors to create objects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Methods and Generic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113How to write your own method? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Writing Your Own Generic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

30 S4 Class 117How to define S4 Class? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117How to create S4 objects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117How to access and modify slot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Methods and Generic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119How to write your own method? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

31 Reference Class 122How to define a reference class? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122How to create a reference objects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122How to access and modify fields? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Warning Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Reference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

32 Inheritance 127Inheritance in S3 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Inheritance in S4 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Inheritance in Reference Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

VII Graphs & Charts 131

Introduction 133

33 Bar Plot 134Plotting Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135How to plot higher dimensional tables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136How to plot barplot with matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

34 Histogram 139Example 1: Simple histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Example 2: Histogram with added parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Return Value of hist() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Example 3: Use Histogram return values for labels using text() . . . . . . . . . . . . . . . . . . . . 142Defining the Number of Breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Example 5: Histogram with non-uniform width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

35 Pie Chart 145Example: Simple pie chart using pie() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Page 7: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

6 CONTENTS

Example 2: Pie chart with additional parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

36 Box Plot 147Return Value of boxplot() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Multiple Boxplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Boxplot from Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

37 Strip Chart 152Example 1: Strip chart of daily air quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Example 2: Strip chart of airquality using jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Multiple Strip Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15437.1 Strip Chart from Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

VIII Advanced Plotting Operations 157

Introduction 159

38 Coloring Plot 160Using Color Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Using Hex Values as Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Using RGB Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Color Cycling in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Using Color Palette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

39 Saving Plot 166How to save plot as a bitmap image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166How to save plot as a vector image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

40 Plot Function 170Adding Titles and Labeling Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170Changing Color and Plot Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171Overlaying Plots Using legend() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

41 Multiple Plots 174par() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174More Precise Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

42 3D Plot 178Adding Titles and Labeling Axes to Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179Rotational angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179Coloring and Shading Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

IX Conclusion 180

43 What Next 182Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182Statistical computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183Advanced R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Page 8: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Part I

Welcome

7

Page 9: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any
Page 10: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Chapter 1

Introduction

R is an exciting language to start your programming journey. The vast applications and usage of R in todaysworld makes it a perfect starting point for any beginner. Being different from traditional programminglanguages, it feels intuitive and naturally fits into the hands of a non-programmer.

The goal of “R Programming For Beginners” is to help you as a beginner to get comfortable with R pro-gramming while learning the basics of R. By the end of this book, you’ll have gained a strong footing on anemerging and powerful programming language under your belt.

1.1 What you will learn

Describing in one sentence, you’ll learn to program in R. However, this book is not intended to make youa master in R or programming, but rather give you a solid foundation on the concepts and building blocksthat make up R.

Depending on the system you’re working on, you’ll first learn to install R and its required tools; morespecifically, the IDE: R Studio. You’ll learn to get familiar with the working environment of R Studio andgo on to write your very first program in R.

Next, you’ll look into the basics of an R program. We’ll dissect specific portions that makes up an Rprogram and explain the working of each component. You’ll learn the mechanisms of storing and retrievingyour data, and perform basic logical and mathematical operations in R.

Third, you’ll learn ways to store and represent different forms of data (real world and otherwise) in variousdata structures present in R, like vectors, lists, matrix, data frames and factors.

Fourth, you’ll learn to change the behavior and flow of R program using control structures. Using controlstructures, you define a path for a program to take and achieve your required result.

Fifth, you’ll learn about functions. Functions are fundamental to any process and learning to create onehands you to ability to perform tasks and maintain the structure of a program. It is undoubtely one of themost powerful tools for a programmer.

Next, you’ll dig a bit deep into the Object Oriented (OO) world of programming, where every problemis based on the concepts of objects. You’ll learn various OO features and systems R provides to simplify acomplex problem in the form of objects.

Finally, you’ll learn to visualize your data in the form of Charts and Graphs. You’ll learn to create,modify and describe your data better through various plots.

9

Page 11: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

10 CHAPTER 1. INTRODUCTION

1.2 Prerequisites

There are no hard prerequisites before getting started with this book. However, it’s helpful if you are familiarwith basic statistical concepts as we’ll be talking about some through our examples. More so, we assumeyou to be willing to learn R.

This book doesn’t assume you to be familiar with any sort of programming language before.

1.3 Things to know before start learning R

1.3.1 What is R?

R is a popular beginner friendly programming language specially built for working with data. R is built ontop of the language S programming that was originally intended as a programming language that would helpthe student learn programming while playing around with data.

1.3.2 Why use R?

1. R is open source and free!

R is free to download as it is licensed under the terms of GNU General Public license.

You can look at the source to see what’s happening under the hood. There’s more, most R packagesare available under the same license so you can use them, even in commercial applications withouthaving to call your lawyer.

2. R is popular - and increasing in popularity

IEEE publishes a list of the most popular programming languages each year. R was ranked 5th in2016, up from 6th in 2015. It is a big deal for a domain-specific language like R to be more popularthan a general purpose language like C#.

This not only shows the increasing interest in R as a programming language, but also of the fields likeData Science and Machine Learning where R is commonly used.

3. R runs on all platforms

You can find distributions of R for all popular platforms - Windows, Linux and Mac.

R code that you write on one platform can easily be ported to another without any issues. Cross-platform interoperability is an important feature to have in today’s computing world - even Microsoftis making its coveted .NET platform available on all platforms after realizing the benefits of technologythat runs on all systems.

4. Learning R will increase your chances of getting a job

According to the Data Science Salary Survey conducted by O’Reilly Media in 2014, data scientists arepaid a median of $98,000 worldwide. The figure is higher in the US - around $144,000.

Of course, knowing how to write R programs won’t get you a job straight away, a data scientist hasto juggle a lot of tools to do their work. Even if you are applying for a software developer position, Rprogramming experience can make you stand out from the crowd.

5. R is being used by the biggest tech giants

Adoption by tech giants is always a sign of a programming language’s potential. Today’s companiesdon’t make their decisions on a whim. Every major decision has to be backed by concrete analysis ofdata.

Page 12: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

1.3. THINGS TO KNOW BEFORE START LEARNING R 11

6. Companies Using R

R is the right mix of simplicity and power, and companies all over the world use it to make calculateddecisions. Here are a few ways industry stalwarts are using R and contributing to the R ecosystem.

Table 1.1: Companies and their R contributions and applications.

Company Application/ContributionTwitter Monitor user experienceFord Analyse social media to support design decisions for their carsNew York Times Infographics, data journalismMicrosoft Released Microsoft R Open, an enhanced R distribution and Microsoft R

server after acquiring Revolution Analytics in 2015Human Rights DataAnalysis Group

Measure the impact of war

Google Created the R style guide for the R user community inside Google

While using R, you can rest assured that you are standing on the shoulders of giants.

1.3.3 Is R Programming an easy language to learn?

This is a difficult question to answer. Many researchers are learning R as their first language to solve theirdata analysis needs.

That’s the power of the R programming, it is simple enough to learn as you go. All you need is data and aclear intent to draw a conclusion based on analysis on that data.

However, programmers that come from a Python, PHP or Java background might find R quirky and confusingat first. The syntax that R uses is a bit different from other common programming languages.

While R does have all the capabilities of a programming language, you will not find yourself writing a lotof if conditions or loops while writing code in the R language. There are other programming constructs likevectors, lists, frames, data tables, matrices etc. that allow you to perform transformations on data in bulk.

1.3.4 Alternatives to learning R

R is not the only language that you can use for statistical computing and graphics. Some of the popularalternatives of R programming are:

Python – Popular general purpose language

Python is a very powerful high-level, object-oriented programming language with an easy-to-use and simplesyntax.

Python is extremely popular among data scientists and researchers. Most of the packages in R have equivalentlibraries in Python as well.

While R is the first choice of statisticians and mathematicians, professional programmers prefer implementingnew algorithms in a programming language they already know.

The choice between R vs Python also depends on what you are trying to accomplish with your code. If youare trying to analyze a dataset and present the findings in a research paper, then R is probably a betterchoice. But if you are writing a data analysis program that runs in a distributed system and interacts withlots of other components, it would be preferable to work with Python.

SAS (Statistical Analysis System)

Page 13: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

12 CHAPTER 1. INTRODUCTION

SAS is a powerful software that has been the first choice of private enterprise for their analytics needs fora long time. Its GUI and comprehensive documentation, coupled with reliable technical support make it avery good tool for companies.

While R is the undisputed champion in academics and research, SAS is extremely popular in commercialanalytics. But R and Python are gaining momentum in the enterprise space and companies are also tryingto move towards open-source technologies. Time will tell if SAS will continue its dominance or R/Pythonwill take over.

SPSS – Software package for statistical analysis

SPSS is another popular statistical tool. It is used most commonly in the social sciences and is consideredthe easiest to learn among enterprise statistical tools.

SPSS is loved by non-statisticians because it is similar to excel so those who are already familiar with it willfind SPSS very easy to use.

SPSS has the same downside as SAS – it is expensive. SPSS was acquired by IBM in 2009 for a reported$1.2 billion.

1.4 Getting Help

Learn and get help from others. There are tons of great R communities that will help you solve real lifeproblems and become better in R.

Some of them are:

StackOverflow Most Popular programming Q&A site on the web

R-bloggers R news and tutorials contributed by (750) R bloggers

R-help An active mailing list for R

Nabble R specific mailing list and forum

Page 14: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Chapter 2

Run R on Your Computer

You will find the easiest way to run R programming on your system (Windows, Mac OS X or linux) in thischapter.

2.1 In Mac

1. Go to official site of R programming2. Click on the CRAN link on the left sidebar3. Select a mirror4. Click “Download R for (Mac) OS X”5. Download the latest pkg binary6. Run the file and follow the steps in the instructions to install R.

2.2 In Linux

On Ubuntu

The Advanced Packaging Tool (APT) that comes with Ubuntu uses a file called sources.list to decide whereto search for packages.

Before we can install R, we need to tell Ubuntu to look into the CRAN R repositories and also add a publickey for secure download.

1. Open the sources.list file (usually located at /etc/apt/sources.list) in a text editor and add the followingline at the end

deb https://<my.favorite.cran.mirror>/bin/linux/ubuntu <distribution>/

For instance, If you are running Ubuntu trusty and want to use the RStudio CRAN mirror, the linewould be

deb https://cran.rstudio.com/bin/linux/ubuntu trusty/

If you are lazy like all good programmers, you can do this directly from the terminalwithout having to open a text editor as

sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list'

13

Page 15: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

14 CHAPTER 2. RUN R ON YOUR COMPUTER

2. Authenticate the Ubuntu packages on CRAN

The packages for Ubuntu that are stored on CRANmirrors are all signed using a key with ID E084DAB9We download the public key from the Ubuntu keyserver using this ID and add it to our system usingthe command

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

3. Update the list of available packages

Since we modified the sources.list, we need to tell APT to download the packages that are availablefrom the CRAN servers by running the command.

sudo apt-get update

4. Download and install R

Almost done. Just download and install the R package by running the command:

sudo apt-get -y install r-base

5. Open up the R console and issue following command.

$ R

If there were no issues during installation. The R console should open successfully with informationabout your R installation.

RedHat-based Distributions

The process is similar for Redhat-based Linux Distributions like CentOS. Instead of modifying a file likesources.list, you can directly add the repository for EPEL(Extra Packages for Enterprise Linux) by usingthe following command.

su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'

You can find the url for the correct rpm file for your system here.

Now it’s just a matter of updating the list of available packages and installing R.

sudo yum updatesudo yum install R

Fedora

Installing R on fedora is a piece of cake. The Fedora repositories have the latest version of R binariesinstalled.

Just run the commands:

sudo yum updatesudo yum install R

2.3 In Windows

1. Go to official site of R programming2. Click on the CRAN link on the left sidebar3. Select a mirror4. Click “Download R for Windows”

Page 16: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

2.4. INSTALLING RSTUDIO 15

5. Click on the link that downloads the base distribution6. Run the file and follow the steps in the instructions to install R.

2.3.1 Should I Install 32 bit or 64 bit?

Most people don’t need to worry about this. Obviously the 64-bit version of R won’t work on a 32-bitmachine but both the 32-bit and 64-bit versions of R runs seamlessly on 64-bit Windows.

You might want to consider installing 32-bit version of R if your production environment is 32-bit becausesome packages might have compatibility issues and might cause the “But it works on my machine” fiasco.

2.4 Installing RStudio

RStudio is the most popular IDE for running R programs and has a free license.

Download the RStudio (Windows, Linux and Mac OS X), run the file and follow the instructions to installit.

Note: R should be installed on your system before you can run RStudio.

After you install RStudio and open it for the first time, it will ask you to choose which version of R to use.

If RStudio detects that R hasn’t been installed on your system, it will show you a warning.

If R has been installed, you’ll see the R Studio interface. In the beginning, you can only see the R consolewhere you can write one line statements in R and execute them.

However, even for trivial work, you will need to perform a sequence of steps and it is better to create an Rscript.

Page 17: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

16 CHAPTER 2. RUN R ON YOUR COMPUTER

Go to File > New File > R Script as shown in the screenshot below to create a new R script.

You can now see the R Script Editor where you can type and save R programs that span multiple lines.RStudio isn’t just a text editor but an IDE that helps you run and debug R scripts with ease.

The R Studio GUI is divided into 4 major sections as shown in the screenshot below:

Page 18: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Part II

Basics

17

Page 19: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any
Page 20: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Introduction

Now that you have a basic idea of what is R, why choose it over others, it’s time to dig deep into it.

We’ll start off with the very basics of R. The following chapters proceeds as follows:

• In Your first R program, as the name suggests, you’ll learn to write and execute your first R program.You’ll witness the power of R with just a few lines of code.

• In Variables and constants, you’ll learn different ways to store and retrieve your valuable data in R.You’ll learn what are variables and constants, how they work, different kinds of data you can store,and reserved words you cannot use as variables and constants.

• In Output, you’ll learn to print variables and constants in R. You’ll learn to print variables usingvariable name, print() function and cat() function.

• In Operators, you’ll learn tools to perform basic operations in R, like addition, subtraction, compari-sions, and assignments.

• In Comments, you’ll learn to leave comments about your program. This will be an integral part ofyour programming which helps to better describe and document the working of your program.

• In Packages, you’ll learn to access tools that enhances the power of R. You’ll learn about both localand remote packages, and how to download them, import them and finally use them in R.

• In Environment, you’ll learn about environments in R. You’ll see how variables and constants arestored in R.

19

Page 21: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Chapter 3

Your First R Program

R holds a reputation for getting things done with very little code. If you’re a programmer and thinking“Here comes the Hello World code”, you’re in for a surprise.

In just three lines of code, your first R program will generate 10,000 numbers in a random distribution,organize them based on frequency and create a fancy barchart.

Copy the following code into the RStudio window, press Ctrl+A(Windows) or Cmd+A(Mac) to select allthree lines and press Ctrl+Enter(Windows) or Cmd+Enter(Mac). This wil run the given R program.n <- floor(rnorm(10000, 500, 100))t <- table(n)barplot(t)

Look at the right bottom section of RStudio and you will see this beautiful bar graph showing the bell curveof a random normal distribution.

87 267 324 379 434 489 544 599 654 709 773

010

3050

Here’s what each part of the code does:

20

Page 22: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Pages 21 -177 are omitted from the preview.

Page 23: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Chapter 42

3D Plot

There are many functions in R programming for creating 3D plots. In this section, we will discuss on thepersp() function which can be used to create 3D surfaces in perspective view.

This function mainly takes in three variables, x, y and z where x and y are vectors (Chapters 11) definingthe location along x- and y-axis. The height of the surface (z-axis) will be in the matrix z. As an example,

Let’s plot a cone. A simple right circular cone can be obtained with the following function.cone <- function(x, y){sqrt(x^2+y^2)}

Now let’s prepare our variables.x <- y <- seq(-1, 1, length= 20)z <- outer(x, y, cone)

We used the function seq() to generate vector of equally spaced numbers.

Then, we used the outer() function to apply the function cone at every combination of x and y.

Finally, plot the 3D surface as follows.persp(x, y, z)

x

y

z

178

Page 24: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

179

Adding Titles and Labeling Axes to Plot

We can add a title to our plot with the parameter main.

Similarly, xlab, ylab and zlab can be used to label the three axes.

Rotational angles

We can define the viewing direction using parameters theta and phi.

By default theta, azimuthal direction, is 0 and phi, colatitude direction, is 15.

Coloring and Shading Plot

Coloring of the plot is done with parameter col.

Similarly, we can add shading with the parameter shade.persp(x, y, z,main="Perspective Plot of a Cone",zlab = "Height",theta = 30, phi = 15,col = "springgreen", shade = 0.5)

x

yH

eight

Perspective Plot of a Cone

Page 25: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Part IX

Conclusion

180

Page 26: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any
Page 27: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

Chapter 43

What Next

Depending on your professional background and your personal interests, there are many paths that you cantake from here.

Data Science

One of the heavy uses of R is in the field of Data Science. Almost every company bases their decisionson their data. Whether a business plans to add a new store, or a hospital analyzes the health conditionof a patient, meaningful data is key to each decision. And, drawing meaningful insights from data is DataScience.

With the advent of IoT devices creating terabytes and terabytes of data that can be used to make betterdecisions, data science is a field that has no other way to go but up. Infact, Harvard Business Review nameddata scientist the “sexiest job of the 21st century”.

We’ve created a course on Getting Started in Data Science with R to give you hands-on knowledge on DataScience. The course will give you a solid foundation to explore this amazing field. More specifically weanswer,

• What is Data?• What is Data Science?• What learn Data Science?• And most importantly, how to get started in the field?

And as you start to learn the process, you’ll find the use of R very intuitive and natural to the process.

Statistical computing

R is the most popular programming language among statisticians. In fact, it was initially built by statisticiansfor statisticians. It has a rich package repository with more than 9100 packages with every statistical functionyou can imagine.

R’s expressive syntax allows researchers – even those from non computer science backgrounds to quicklyimport, clean and analyze data from various data sources.

R also has charting capabilities, which means you can plot your data and create interesting visualizationsfrom any dataset.

182

Page 28: FOR BEGINNERS · Chapter 1 Introduction Risanexcitinglanguagetostartyourprogrammingjourney. ThevastapplicationsandusageofRintodays world makes it a perfect starting point for any

183

Machine Learning

One of another paths that is trending all around the world is Machine Learning. In simple words, machinelearning is giving machine the ability to learn on its own. It is done through various algorithms that parsesdata; the machine then learns from the data and take informed decisions on its own.

Companies like Amazon, Google, Facebook’s inner systems is driven by machine learning algorithms to theextent that it’s integrated and reflected through their core values.

R has found a lot of use in predictive analytics and machine learning. It has various package for commonML tasks like linear and non-linear regression, decision trees, linear and non-linear classification and manymore.

If you’re fascinated by machine learning, you can start exploring machine learning algorithms and applicationsonline.

Advanced R

Whichever path you choose, learning to program is inherent to each of them and writing effective code givesyou a much needed headstart to solve new problems quickly. So, it’s only logical to invest your next step inlearning advanced topics in R through various courses and tutorials.