r programming overview

22
Program Overview David Lambert 2014

Upload: dlamb3244

Post on 22-May-2015

237 views

Category:

Data & Analytics


1 download

DESCRIPTION

Basic introduction to "R", a free and open source statistical programming language designed to help users analyze data sets by creating scripts to increase automation. The program can also be used as a free substitute for Microsoft Excel.

TRANSCRIPT

Page 1: R Programming Overview

Program Overview David Lambert

2014

Page 2: R Programming Overview

“R” Overview •  What is “R”? •  What does “R” look like? •  Who uses “R”? •  Why “R”? •  Installing “R” •  Where can I find “R”esources? •  “R” in One Month?

Page 3: R Programming Overview

What is “R”? “R” is a…

•  Statistical Calculator/Language •  Means, Probability, NOVA, F-Test, T-Test, etc.

•  Programming Language •  High-level programming platform to call, create, and organize advanced data structures.

•  Graphical Interface •  Vast list of analytic models to help visualize data.

Page 4: R Programming Overview

What does “R” look like?

“R” Script

“R” Console

“R” Variables

“R” Models

R Studio, a widely used IDE (Integrated development environment) for “R”

Page 5: R Programming Overview

Who uses “R”? Just a few small companies….

List of Companies Aquired from, http://www.revolutionanalytics.com/companies-using-r 4/14/14

Page 6: R Programming Overview

Why “R” “R” is quickly becoming the preferred Analytic Tool and package for many

companies world-wide. Why? •  “R” is free •  “R” is open-source

o  Coders, Staticians and Data Scientists are constantly writing blogs, papers and new programming packages, set on improving functionality, ease and implementation of “R”.

•  “R” is powerful o  “R” can utilize the majority of other commercial statistical languages

including JMP, Mathematica, MATLAB, SPSS, STATISTICA, and SAS. o  “R” can be integrated with other popular programming languages such

as Python, Perl, RUBY, C and C++. o  “R” can manipulate many different data types including

characters, numerics, integers, complex, and logical expressions. o  “R” can scrape data from Excel, XML, JSON, SQL, HDF5, Web, API’s

and more.

http://www.econometricsbysimulation.com/2014/03/why-use-r-five-reasons.html

Page 7: R Programming Overview

Excel vs. “R”

Why “R”

Price $109.99 Free

Data Entry Data Must Be Entered Physically (Typically) Data can be gathered autonomously by scraping webpages, servers or files

Data Access Must Have All Files on Computer or Cloud Can Scrape Data or Call Files from Internet or Network

Efficiency Must Open Whole Spreadsheet(s) or WorkBook(s). Can bog down memory.

Can easily call information, partial information or variable information from File(s)

Adaptability Very Low Customization Open-Source and Programmable

Integration Limited Software to Software Integration Diverse Communication Between multiple Coding Languages

Page 8: R Programming Overview

http://www.r-project.org/

Download “R” Link

“R” Homepage

Installing “R”

Page 9: R Programming Overview

http://www.r-project.org/

List of Servers 1.  Scroll to your Country 2.  Click closest location

for fastest download

Installing “R” Mirror/Server Page

Page 10: R Programming Overview

http://ftp.osuosl.org/pub/cran/

Choose Operating System

Installing “R”

Page 11: R Programming Overview

http://ftp.osuosl.org/pub/cran/

This is the basic installer. Start by using this.

Installing “R”

Page 12: R Programming Overview

“R”esources within “R” A useful tool in “R” is the help function within the program. Running the function will give a detailed report of the available information.

help(functionName) o  Type any “R” function into the

help() function or type a “?” in front of any function and a help page URL will pop up on the screen about the function.

“help(mean) OR ?mean” will produce a page that looks like this

Description

Usage

Arguments

Value

Reference

See Also

Examples

Page 13: R Programming Overview

Where can I find “R”esources There are many free resources for “R”

Websites

•  R Project

o  Robust resource for R packages and information. Can be technical. o  http://www.r-project.org/

•  Quick R

o  Great website with lots of useful articles, tutorials and more.

o  http://www.statmethods.net/

•  Revolution Analytics

o  R News and Community

o  http://www.revolutionanalytics.com/

•  Stack Overflow

o  Very large and dedicated programming community. Q&A Forum with real problems being solved. Not strictly “R” but great info.

o  http://www.stackoverflow.com

•  R-Bloggers

o  News and Tutorials by “R” users in blogger format.

o  http://www.r-bloggers.com/

•  r-dir

o  “R” reference site. Wonderful links to sections based on Tutorials, Courses, and databases for “R” o  http://r-dir.com/

Page 14: R Programming Overview

Articles

•  Econometrics by Simulation

o  http://www.econometricsbysimulation.com/2014/03/why-use-r-five-reasons.html (Great article for Reasons to use “R”)

Interactive

•  Code School

o  Code School offers various interactive tracks that intuitively and effectively demonstrate use and importance of the technology being disccused. Very impressive educational structure and course materials.

o  Try “R”

o  https://www.codeschool.com/courses/try-r

Online Certification Programs

•  Coursera

o  Free Online classes including video lectures, modules, exercises and huge discussions boards

o  https://www.coursera.org/

§  John Hopkins University: Data Science Certification

§  https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage

•  Udacity

o  Limited in terms of course offerings compared to Coursera but still serves as a great resource and Certification Opportunity for “R”

o  https://www.udacity.com/

Where can I find “R”esources

Page 15: R Programming Overview

Video Sequences

•  Youtube/Google

o  As always, there is a plethora of ‘YouTubers’ giving excellent tutorials for free. Many are very good at it. o  https://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP (Google Dev. Team, Intro to “R”)

o  https://www.youtube.com/watch?v=WJDrYUqNrHg (More in-depth video sequence.)

•  R for Statistical Programming

o  http://sentimentmining.net/StatisticsWithR/

Where can I find “R”esources

Page 16: R Programming Overview

Where can I find Datasets? 1. “R” packages have datasets in them

•  Install.package(“Histdata”) •  Historical Data sets and analysis

2. Websites have datasets •  Data.gov

•  Data.gov.IN

•  Data.gov.UK

•  Government data with many different datasets from different industries. •  HealthData.gov

•  Governement health data. •  Scb.se/en_/

•  Statistics Sweden. Open government data.

•  Ropengov.github.io/

•  Open government datasets.

Page 17: R Programming Overview

“R” in One Month? Here I will discuss a guideline to get new “R” users on their feet and ready to play with the software.

•  Week 1

•  Complete “Try R” Chapters 1-8 o  https://www.codeschool.com/courses/try-r

•  Install “R” program or “R” Studio.

•  Install SWIRL Package and Complete Tutorial

•  Week 2

•  Begin Google Dev “Intro to R” Video Sequence

•  Youtube Google Dev R Video Sequence •  Sign up for Coursera.com: John Hopkins - R Programming

•  https://www.coursera.org/course/rprog

•  Import file(s) and/or download file(s) from internet to upload into “R”.

•  Learn to call documents into “read.table” and create a “data.frame”.

•  Learn about different file types; XML, CSV, XLXS.

•  Week 3

•  “Data.frame” manipulation. Learn to scrape variables, columns/rows, or values from a document into a “data.frame”.

•  Run simple functions on scraped data such as mean(), sum(), etc.

•  Week 4

•  Call information from multiple sources and analyze data through basic functions

•  Generate random values from data set. •  Create visual graph from data; Histogram, Plot, etc.

Page 18: R Programming Overview

“R” in One Month? •  Week 1

1.  Complete “Try R” Chapters 1-8

•  https://www.codeschool.com/courses/try-r

2.  Install “R” program or “R” Studio.

•  http://ftp.osuosl.org/pub/cran/

Mac Install Windows Install

3. Install SWIRL Package and Complete Tutorial

https://class.coursera.org/rprog-008/assignment/view?assignment_id=9

Intro to SWIRL

Page 19: R Programming Overview

“R” in One Month? •  Week 2

1.  Begin Google Dev “Intro to R” Video Sequence

•  Youtube Google Dev R Video Sequence

•  Follow along with exercises Video 1-7 (approx. 2 min each)

2.  Sign up for Coursera.com: John Hopkins - R Programming

•  https://www.coursera.org/course/rprog

•  Watch Week 1 Video Lectures

• Goals: •  Import file(s) and/or download file(s) from internet to upload into “R”.

•  Learn to call documents into “read.table” and create a “data.frame”.

•  Learn about different file types; XML, CSV, XLXS.

Page 20: R Programming Overview

“R” in One Month? •  Week 3

1.  Continue Google Dev “Intro to R” Video Sequence

•  Youtube Google Dev R Video Sequence

•  Follow along with exercises Video 8-15 (approx. 2 min each)

2.  Continue at Coursera.com: John Hopkins - R Programming

•  https://www.coursera.org/course/rprog

•  Watch Week 2 Video Lectures

• Goals: •  “Data.frame” manipulation. Learn to scrape variables, columns/rows, or values from a document into a “data.frame”.

•  Run simple functions on scraped data such as mean(), sum(), etc.

Page 21: R Programming Overview

“R” in One Month? •  Week 4

1.  Continue Google Dev “Intro to R” Video Sequence

•  Youtube Google Dev R Video Sequence

•  Follow along with exercises Video 16-21 (approx. 2 min each)

2.  Continue at Coursera.com: John Hopkins - R Programming

•  https://www.coursera.org/course/rprog

•  Watch Week 3 Video Lectures

• Goals:

•  Call information from multiple sources and analyze data through basic functions

•  Generate random values from data set.

•  Create visual graph from data; Histogram, Plot, etc.

Page 22: R Programming Overview

Contact

David Lambert