r / pythonold and new •r is an implementation of s (created in 1976) and was first released in...

38
R / Python Why and How to Get Started

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

R / PythonWhy and How to Get Started

Page 2: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

What do you use?

• Use SPSS, Stata, or SAS

• with the GUI/menus

• with syntax

• Use Excel

• for data management

• for data analysis

• Use Matlab, R, or Python

Page 3: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Python

• General programming language• Create applications, run websites, interface with systems

• Has all the elements of other languages

• Created by groups of computer scientists• Runs fast and stable for production workflows

• Simplest of languages, one best way to do any action

Page 4: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

R

• Statistical Language• Built to do math and work with datasets

• Can utilize some tools from other languages

• Created by statisticians • Fast and intuitive to do analysis, slower to process

• Many statisticians have increased it's capabilities

Page 5: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Both

• Use Scripting• The code/syntax is intended to be saved in a script file

• The code can be re-played to reproduce the output

• Open Source & Extensible• Anybody can create new add-ins ("packages")

• People can NOT change the original without permission

• Free to use

Page 6: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

As originally built, you:

Type instructions at a prompt…

R Console

Python Shell

Page 7: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

…and get output like this

R - Regression

Python - Frequency Table

Page 8: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

IDEs

• Script Window

• Console

• Help

• History

• Files

• Plots

• Environment / Variable explorer

• Run current selection

Script

Script

Console

Console

Spyder (Python)

RStudio (R)

Page 9: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

So, why all the buzz?

• Free software that can do everything SPSS, Stata, SAS, and Excel can do.

• Massive improvements in ease of use through packages with convenience functions.

Page 10: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

What to Install

• R• Install R from CRAN

• Install RStudio

• Python• Install Anaconda

All of these are cross-platform with regular installers

Page 11: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

No Installation Needed

• RStudio Cloud (beta)• https://rstudio.cloud/

• Python Anywhere• https://www.pythonanywhere.com/

• Both require free accounts.

Page 12: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Old and New

• R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000.Hadley Wickham's dplyr package was introduced in 2014.The RStudio IDE was released in 2011 with v1.0 in 2016.

• Python was created in 1990. The interactive shell was released in 2001. The data management package Pandas was first released by Wes McKinney in 2008; v1.0 was released in 2020.The Spyder IDE was released in 2009.

Page 13: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Misconceptions

• Python is better than R• Python can do a wider variety of computer tasks than R.

Python has Breadth, R has Depth in Data/Statistics

• The languages themselves are not what people are judging, they are judging the entire ecosystem.

• Python is easier than R• Python is the simplest of the programming languages.

R is not a programming language.

• Since R was made by Statisticians, it does some things different than other general programming languages.

Page 14: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Demonstration

• Python

• R

Page 15: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Functions & PackagesThe building blocks of computer languages

Page 16: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Have you used Functions?

word(stuff)

Page 17: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Have you used Functions?

word(stuff)

=AVERAGE(V1:V5)

COMPUTE average = MEAN(v1, v2, v3, v4, v5).

egen average = rowmean(v1 v2 v3 v4 v5)

average = mean(of v1-v5);

Page 18: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Creating a Scale Index

ncc_score = (ncc1 + ncc2 + ncc3 + ncc4 + ncc5) / 5

ncc_score = SUM(ncc1, ncc2, ncc3, ncc4, ncc5) / 5

ncc_score = MEAN(ncc1, ncc2, ncc3, ncc4, ncc5)

ncc_score = SCALE(ncc)

ncc_score = SCALE(ncc, "sum")

"Convenience Function"

Added "Argument"

Page 19: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Function Names and Arguments

Page 20: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Functions & Objects

Page 21: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Packages

• Package: A group of functions installed together

• Packages may have functions with the same name!

• Install: Copy instructions to your computer

• Load/Attach/Import: Put instructions in memory

Page 22: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Media Literacy -> Package Literacy

• Who wrote it?

• How long has it been around?

• How many other people use it?

• Where is the code?

• How good is the documentation?

• What kind of testing has been done?

• Does it give the same results?

• What do other people say about it?

Page 23: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Is R / Python for You

Page 24: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Check in with yourself:

• Do functions and arguments make sense?

• Can you be detail oriented?

• Can you keep track of things that change?

• Are you good at thinking systematically?

It's okay if you answered "no". You can still use R.

Page 25: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

If Not Yet

• Practice with functions in software you know.

• Use Jamovi (R) and have it show the syntax.

• Practice reading syntax and identifying functions and objects.

Page 26: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Which to Pick

• Start with whichever one…

• … the people around you use

• … has the functions you need

• … looks easier to read for you

• Use R if you mostly work with data tables and do statistics

• Tends to get new statistical procedures first

• Easier to read and understand

• Use Python if you often do non-statistical programming

• More and better non-tabular text-processing tools

• Better integrates with applications

Page 27: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

R + Python

• Use R in Python (r2py)

• Use Python in R (reticulate)

• Use R or Python in SPSS, Stata, and SAS

• Some features in R get "ported" to Python

• Some features in Python get "ported" to R

• Use SQL in R or Python

Page 28: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Where to Start?

• Data Management• Many, many, functions

• Python: pandas

• R: tidyverse, data.table, or sqldf

• Statistical Analysis• Formula Notation

• Python: statsmodels

• R: base R, afex/car (ANOVA), lme4 (Mixed Models), etc.

• Graphing• Python: seaborn (uses matplotlib)

• R: ggplot2, ggformula, or lattice

Page 29: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Interpreting Tutorials

Page 30: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Recognizing Packages

In R: • library(package)

• require(package)

• package::function()

In Python:

• import package as nickname

• nickname.stuff

Page 31: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

What to Look For

Functions & Methods:

Objects:

Page 32: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Learning Packages & Functions

Page 33: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Built-In Datasets

R

• data() #see all the datasets included

• data(name) # make it available

Python

• Some options, but none great

• Just use R's• https://vincentarelbundock.github.io/Rdatasets/

• Both allow URLs in read_csv

Page 34: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Creating Data

Use VectorsIt is very common to use vectors as variables without

making them into a dataframe.

A <- c(1,2,3,4,5)B <- c(7:20, 200)t.test(A,B)

group <- 1:2value <- rnorm(20)data <- data.frame(group, value)t.test(value ~ group, data=data)

Page 35: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Jupyter Notebookextension is .ipynb

Page 36: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Our InfoGuides

• https://infoguides.gmu.edu/learn_r

• https://infoguides.gmu.edu/learn_python

• DataCamp

• Carpentries

• CodeSchool

• Coursera

Page 37: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Jamovijamovi.org

Page 38: R / PythonOld and New •R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The

Syntax Mode

# Descriptives

jmv::descriptives(data = data,vars = "fate")

Comments

Packages

Functions

Data Specification

Values

Arguments