data science: data visualization boot camp what is …ccartled/teaching/2020-spring/...data science:...

32
Data Science: Data Visualization Boot Camp What is R? Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 1/32

Upload: others

Post on 28-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Data Science: Data Visualization Boot CampWhat is R?

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    24 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 2020

    1/32

  • 2/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Table of contents (1 of 1)

    1 Intro.2 What is R?

    The languageAvailability

    3 RStudioBasic how-tos (left side)Basic how-tos (right side)

    4 R BasicsTypes of numbersVariables

    Operations and functions

    5 Hands-on

    6 Q & A

    7 Conclusion

    8 References

    9 Files

  • 3/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    What are we going to cover?

    We’re going to talk about:

    What is the language R?

    What GUI do I use to write andexecute R programs?

    What are some basic variable typesin R?

  • 4/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    The language

    The official definition.

    “R is a language and environment for statistical computingand graphics. It is a GNU project which is similar to the Slanguage and environment which was developed at Bell Labo-ratories (formerly AT&T, now Lucent Technologies) by JohnChambers and colleagues. R can be considered as a differ-ent implementation of S. There are some important differ-ences, but much code written for S runs unaltered under R.R provides a wide variety of statistical (linear and nonlinearmodeling, classical statistical tests, time-series analysis, classifi-cation, clustering, . . . ) and graphical techniques, and is highlyextensible. The S language is often the vehicle of choice forresearch in statistical methodology, and R provides an OpenSource route to participation in that activity.”

    CRAN Staff [2]

  • 5/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Availability

    R is available for almost all major operating systems.

    Linux (and its variants)

    (Mac) OS X

    Windows

    Get the R environment and a command line interface.Download from: https://cloud.r-project.org/Source code is available for custom OSs.https://github.com/wch/r-source

    https://cloud.r-project.org/https://github.com/wch/r-source

  • 6/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (left side)

    A complete IDE

    A complete, integrated Rdevelopment environment.

    1 Text editor

    2 R console

    3 Variable list and contents

    4 Tabbed display for differentuses

    See software overview and designdocument for version anddownload information.

  • 7/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (left side)

    Same image.

  • 8/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (left side)

    Editor

    “Smart” editor

    CTRL + O to open a file

    CTRL + S to save a file

    CTRL + A to highlightcontents

    CTRL + Enter to transfercontents to Console

    Multiple files can be openedat once

  • 9/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (left side)

    Same image.

  • 10/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (left side)

    Console

    Interprets R commands

    Commands from editor,other panels, or manuallyentered

    Execution errors appear here

    Contents of print functionappear here

  • 11/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (left side)

    Same image.

  • 12/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Variables

    Displays contents of selectedenvironment (includingvariables)

    Display history of consolecommands

    Can save and load data fromdata files

  • 13/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Same image.

  • 14/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Tabbed display

    Displays files in the currentdirectory

    Displays plots from theconsole

    Allows packages to beadded, or removed from theconsole

    Provides help/man pages forR functions and packages

  • 15/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Same image.

  • 16/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Starting an R script in the background

    The image shows a Windowsenvironment.A *nix environment command is:Rscript backend.R &

  • 17/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Same image.

  • 18/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Basic how-tos (right side)

    Basic help with functions[1]1 Based on subject:

    help.search("data input")

    2 Based on pattern matching:apropos("lm")

    3 Looking for a specific item:find("lm")

    4 About a specific item:?lm

    ??lm

    5 Example of a function:example(lm)

    6 Source code for a function:lm

    7 Demonstration of a function:demo(persp)

    8 Demonstration of a function:vignette("moveline",

    package="grid")

    9 Contents of a library:library(help=spatial)

    10 Install a new library:install.packages("Kfn")

    11 Which data are included in apackage:data(package="ggplot2")

    12 Which data are included in allpackages:data(package =

    .packages(all.available =

    TRUE))

    13 Find an overview of R packages:https://cran.r-project.

    org/web/views/

    https://cran.r-project.org/web/views/https://cran.r-project.org/web/views/

  • 19/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Types of numbers

    Lots of different number types

    We’ll dive into each type shortly.Other things:

    Each builds on another.

    Each may have attributes.

    Each has a type.

    Each has a class.

    And, you can create your own.

  • 20/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Types of numbers

    Same image.

    And, you can create your own.

  • 21/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Types of numbers

    Definition of types (1 of 3)

    Character: surrounded by “ (“hi”) or ’ (’bye’). Special characters are escapedwith \

    Complex: a combination of a real and an imaginary number in the form a + bi

    DataFrame is a table or a two-dimensional array-like structure in which eachcolumn contains values of one variable and each row contains oneset of values from each column.

    Date: number of days relative to January 1, 1970 (Unix dates)

    Diff time: represent the amount of time between pairs of dates or date-times

    Double: numbers be specified in decimal (0.1234), scientific (1.23e4), orhexadecimal (0xcafe)

    Factor: Conceptually, factors take on a limited number of different values;such variables are often referred to as categorical variables

  • 22/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Types of numbers

    Definition of types (2 of 3)

    Integer: are written similarly to doubles but must be followed by upper caseell (L) (1234L, 1e4L, or 0xcafeL)

    List: objects which contain elements of different types like numbers,strings, vectors, and another list inside it

    Logical: can have only one of two values (T[RUE] or F[ALSE])

    NULL: NULL represents the null object in R. NULL is used mainly torepresent the lists with zero length

    Numeric: the default computational data type

    POSIXct: Portable Operating System Interface (POSIX) a family ofcross-platform standards, “ct” standards for calendar time

    POSIXlt: Portable Operating System Interface (POSIX) a family ofcross-platform standards, “lt” standards for local time

    Raw: data is stored as raw bytes

  • 23/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Types of numbers

    Definition of types (3 of 3)

    Scalar: an individual value (actually a vector of length 1)

    Tibble: are a modern take on data frames. They keep the features thathave stood the test of time, and drop the features that used to beconvenient but are now frustrating

    Vector: a basic data structure in R. It contains elements of the same type.

  • 24/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Variables

    Variable types (part 1 of 2)[3]

    1 Variable names:

    Names are case sensitiveNames cannot beginwith numbers or specialsymbolsNames cannot haveinternal spaces

    2 Scalars (simple values):variable

  • 25/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Variables

    Variable types (part 2 of 2)[3]

    1 Data frames (each column must have the same number of values):L3

  • 26/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Operations and functions

    Operation

    The basic data type is avector.

    It is easy to create a vector,one way is as a sequence

    x

  • 27/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Operations and functions

    Functions are supported

    1 Have the same namingconventions as variables

    2 Have three parts:1 Optional pass parameters

    (named, evaluated,unnamed)

    2 Text of the function3 The environment where

    and while the functionexecutes

    3 The last value evaluated isreturned.

    4 Statements grouped by“curly braces” or semicolons.

    functionName

  • 28/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Some simple exercises to get familiar with R andRStudio

    1 Create a variable andassign it the value 3

    2 Print your variable

    3 Create a function thattakes one parameter andreturns the square of thatvalue

    4 Use your function tocompute the square of 45

    5 Print the value of thepassed parameter inside thefunction

    6 Open the file library.Rand explain what thefunction dumpObject does

  • 29/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Q & A time.

    Q: Do you know what the deathrate around here is?A: One per person.

  • 30/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    What have we covered?

    Covered a little bit of R’sbackgroundLooked at RStudio, a crossplatform GUI for working with RLooked at some R basics (variabletypes and functions)

    Next: what is data visualization anyway?

  • 31/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    References (1 of 1)

    [1] Michael J. Crawley, The R Book, John Wiley & Sons, 2012.

    [2] CRAN Staff, What is R?,https://www.r-project.org/about.html, 2017.

    [3] Simon Walkowiak, Big Data Analytics with R, PacktPublishing Ltd., 2016.

    https://www.r-project.org/about.html

  • 32/32

    Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files

    Files of interest

    1 Software installation

  • Software in Support of the Old Dominion UniversityCollege of Continuing Education and Professional

    Development Big Data: Data Visualization Boot Camp

    Chuck Cartledge

    November 24, 2019

    Contents

    1 Introduction 1

    2 Discussion 1

    3 Conclusion 2

    A Software on each workstation 2

    B Software installation checkout 4

    C Files 5

    1 Introduction

    A work in progress for software needed and used in the support of the Old Dominion Uni-versity (ODU) College of Continuing Education and Professional Development (CEPD) BigData: Data Visualization boot camp.

    2 Discussion

    Software will be needed on each virtual machine for the boot camp. This draft report containsa list of needed software, R scripts to install necessary libraries, and simple R scripts to testthe installation (see Section B).

  • 3 Conclusion

    After installing all the software identified in this report on their personal computers, thestudent will be able to replicate all boot camp activities.

    A Software on each workstation

    This section contains the assumptions about the operating system environment, and softwareload out for each work station.

    1. Operating system: Windows 7

    2. Software

    (a) R

    • Version: 3.3.2• Available from: https://cran.r-project.org/bin/windows/base/

    (b) R Packages An install script is available to programmatically download the neededlibraries (see Section B). The list of libraries/packages include:

    • bitops• cluster.datasets• clusterSim• colorspace• colourlovers• dplyr• ellipse• gcookbook• geosphere• getopt• ggmap• ggplot2• ggpubr• gnm• grDevices• grid

    • gridBase• gridExtra• httr• jpeg• kernlab• KernSmooth• knitr• magrittr• mapdata• maps• methods• modeest• mvtnorm• NISTunits• oec• OpenStreetMap

    • pdftools• plotrix• plyr• png• purrr• RColorBrewer• RCurl• readr• readxl• reshape• rgdal• rgl• rglwidget• rJava• rjson• scales

    • sf

    • sp

    • sphereplot

    • tidyr

    • tm

    • USAboundraries

    • UScensus2000tract

    • utils

    • vcd

    • vcdExtra

    • xlsx

    • xlsxjars

    • XML

    (c) R-Studio

    2

    https://cran.r-project.org/bin/windows/base/

  • • Version: 0.99.903• Available from: https://www.rstudio.com/products/rstudio/download/

    (d) wget

    • Version: 1.*• Available from: https://eternallybored.org/misc/wget/

    The PATH environment variable should be updated to include the location of the Rinterpreter.

    3

    https://www.rstudio.com/products/rstudio/download/

    https://eternallybored.org/misc/wget/

  • B Software installation checkout

    There is an extensive list of software to be installed to support the boot camp. Afterthe software is installed, it is necessary to configure the software and test that it is installedcorrectly. A number of detailed procedureal files and R scripts are included in this document(see Section C) to facilitate the installation checkout. The R script files can be run inRStudio, or any other R environment that supports setting the current working directory.

    The checkout is:

    1. Associate the file extension “.R” with the RStudio program.

    2. Set the current RStudio working directory to the location of installLibraries.R andrun the installLibraries.R script. There should be no errors.

    4

  • C Files

    A collection of miscellaneous files mentioned in the report.

    • installLibraries.R – an R script to install all necessary libraries/packages from “the

    cloud”

    A complete collection of files (presentations, data, scripts, etc.) can be downloaded fromthe boot camp web site using this *nix command:

    wget -np -r https://www.cs.odu.edu/~ccartled/Teaching/2019-Spring/DataVisualization/

    or, this Windows command

    wget -r -np -nH --cut-dirs=3 -R index.* https://www.cs.odu.edu/~ccartled/Teaching/2019-Spring/DataVisualization/

    The Windows version of wget sometimes leaves “trashy” files behind, like “index.html@C=D;O=A”and so on. These files are not part of the boot camp web page, and can be removed or ig-nored. None of the boot camp scripts use, or process these files. The *nix version of wgetdoes not leave trashy files.

    These commands are also located in:https://www.cs.odu.edu/~ccartled/Teaching/2020-Spring/DataVisualization/Errata/

    wget.txt

    5

    rm(list=ls())

    getNeededPackageList