r / pythonold and new •r is an implementation of s (created in 1976) and was first released in...
TRANSCRIPT
R / PythonWhy and How to Get Started
What do you use?
• Use SPSS, Stata, or SAS
• with the GUI/menus
• with syntax
• Use Excel
• for data management
• for data analysis
• Use Matlab, R, or Python
Python
• General programming language• Create applications, run websites, interface with systems
• Has all the elements of other languages
• Created by groups of computer scientists• Runs fast and stable for production workflows
• Simplest of languages, one best way to do any action
R
• Statistical Language• Built to do math and work with datasets
• Can utilize some tools from other languages
• Created by statisticians • Fast and intuitive to do analysis, slower to process
• Many statisticians have increased it's capabilities
Both
• Use Scripting• The code/syntax is intended to be saved in a script file
• The code can be re-played to reproduce the output
• Open Source & Extensible• Anybody can create new add-ins ("packages")
• People can NOT change the original without permission
• Free to use
As originally built, you:
Type instructions at a prompt…
R Console
Python Shell
…and get output like this
R - Regression
Python - Frequency Table
IDEs
• Script Window
• Console
• Help
• History
• Files
• Plots
• Environment / Variable explorer
• Run current selection
Script
Script
Console
Console
Spyder (Python)
RStudio (R)
So, why all the buzz?
• Free software that can do everything SPSS, Stata, SAS, and Excel can do.
• Massive improvements in ease of use through packages with convenience functions.
What to Install
• R• Install R from CRAN
• Install RStudio
• Python• Install Anaconda
All of these are cross-platform with regular installers
No Installation Needed
• RStudio Cloud (beta)• https://rstudio.cloud/
• Python Anywhere• https://www.pythonanywhere.com/
• Both require free accounts.
Old and New
• R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000.Hadley Wickham's dplyr package was introduced in 2014.The RStudio IDE was released in 2011 with v1.0 in 2016.
• Python was created in 1990. The interactive shell was released in 2001. The data management package Pandas was first released by Wes McKinney in 2008; v1.0 was released in 2020.The Spyder IDE was released in 2009.
Misconceptions
• Python is better than R• Python can do a wider variety of computer tasks than R.
Python has Breadth, R has Depth in Data/Statistics
• The languages themselves are not what people are judging, they are judging the entire ecosystem.
• Python is easier than R• Python is the simplest of the programming languages.
R is not a programming language.
• Since R was made by Statisticians, it does some things different than other general programming languages.
Demonstration
• Python
• R
Functions & PackagesThe building blocks of computer languages
Have you used Functions?
word(stuff)
Have you used Functions?
word(stuff)
=AVERAGE(V1:V5)
COMPUTE average = MEAN(v1, v2, v3, v4, v5).
egen average = rowmean(v1 v2 v3 v4 v5)
average = mean(of v1-v5);
Creating a Scale Index
ncc_score = (ncc1 + ncc2 + ncc3 + ncc4 + ncc5) / 5
ncc_score = SUM(ncc1, ncc2, ncc3, ncc4, ncc5) / 5
ncc_score = MEAN(ncc1, ncc2, ncc3, ncc4, ncc5)
ncc_score = SCALE(ncc)
ncc_score = SCALE(ncc, "sum")
"Convenience Function"
Added "Argument"
Function Names and Arguments
Functions & Objects
Packages
• Package: A group of functions installed together
• Packages may have functions with the same name!
• Install: Copy instructions to your computer
• Load/Attach/Import: Put instructions in memory
Media Literacy -> Package Literacy
• Who wrote it?
• How long has it been around?
• How many other people use it?
• Where is the code?
• How good is the documentation?
• What kind of testing has been done?
• Does it give the same results?
• What do other people say about it?
Is R / Python for You
Check in with yourself:
• Do functions and arguments make sense?
• Can you be detail oriented?
• Can you keep track of things that change?
• Are you good at thinking systematically?
It's okay if you answered "no". You can still use R.
If Not Yet
• Practice with functions in software you know.
• Use Jamovi (R) and have it show the syntax.
• Practice reading syntax and identifying functions and objects.
Which to Pick
• Start with whichever one…
• … the people around you use
• … has the functions you need
• … looks easier to read for you
• Use R if you mostly work with data tables and do statistics
• Tends to get new statistical procedures first
• Easier to read and understand
• Use Python if you often do non-statistical programming
• More and better non-tabular text-processing tools
• Better integrates with applications
R + Python
• Use R in Python (r2py)
• Use Python in R (reticulate)
• Use R or Python in SPSS, Stata, and SAS
• Some features in R get "ported" to Python
• Some features in Python get "ported" to R
• Use SQL in R or Python
Where to Start?
• Data Management• Many, many, functions
• Python: pandas
• R: tidyverse, data.table, or sqldf
• Statistical Analysis• Formula Notation
• Python: statsmodels
• R: base R, afex/car (ANOVA), lme4 (Mixed Models), etc.
• Graphing• Python: seaborn (uses matplotlib)
• R: ggplot2, ggformula, or lattice
Interpreting Tutorials
Recognizing Packages
In R: • library(package)
• require(package)
• package::function()
In Python:
• import package as nickname
• nickname.stuff
What to Look For
Functions & Methods:
Objects:
Learning Packages & Functions
Built-In Datasets
R
• data() #see all the datasets included
• data(name) # make it available
Python
• Some options, but none great
• Just use R's• https://vincentarelbundock.github.io/Rdatasets/
• Both allow URLs in read_csv
Creating Data
Use VectorsIt is very common to use vectors as variables without
making them into a dataframe.
A <- c(1,2,3,4,5)B <- c(7:20, 200)t.test(A,B)
group <- 1:2value <- rnorm(20)data <- data.frame(group, value)t.test(value ~ group, data=data)
Jupyter Notebookextension is .ipynb
Our InfoGuides
• https://infoguides.gmu.edu/learn_r
• https://infoguides.gmu.edu/learn_python
• DataCamp
• Carpentries
• CodeSchool
• Coursera
Jamovijamovi.org
Syntax Mode
# Descriptives
jmv::descriptives(data = data,vars = "fate")
Comments
Packages
Functions
Data Specification
Values
Arguments