r session 1 spring 2016 by carol - umasspeople.umass.edu/biep640w/pdf/r session 1 spring 2016 by...

21
R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output (mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 1 of 21 R and RStudio Lab Session 1 Introductions & Basics February 2016 1. Keep a History of your Work (RScript and RMarkdown) ……….... 2. Enter Data Directly into R …………………..………………..…….. 3. Import Excel Data (“.xlsx”) into R ……………..……………..……. 4. Import Stata Data (“.dta”) into R ………………..………...…….… a) From a folder on your computer ……………..………………………….. b) From the internet …………………………………….…………………… 5. Describe Your Data Set Structure ………………………..……….. 6. List Your Data …………………………………………………..… 7. Describe Your Data – Numerical Descriptions …………………… 8. Describe Your Data – Graphical Descriptions ……………………. 9. One and Two Sample Inference ……………………………………. 2 4 5 7 7 7 8 9 10 14 18 Please note I do a lot of comments! You will see many of my commands that begin with the pound sign (#). They’re also in green. While recommended, you don’t actually have to type these comments.

Upload: others

Post on 06-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 1 of 21

R and RStudio

Lab Session 1 Introductions & Basics

February 2016

1. Keep a History of your Work (RScript and RMarkdown) ……….... 2. Enter Data Directly into R …………………..………………..…….. 3. Import Excel Data (“.xlsx”) into R ……………..……………..……. 4. Import Stata Data (“.dta”) into R ………………..………...…….… a) From a folder on your computer ……………..………………………….. b) From the internet …………………………………….…………………… 5. Describe Your Data Set Structure ………………………..……….. 6. List Your Data …………………………………………………..… 7. Describe Your Data – Numerical Descriptions …………………… 8. Describe Your Data – Graphical Descriptions ……………………. 9. One and Two Sample Inference …………………………………….

2

4

5

7 7 7

8

9

10

14

18

Please note I do a lot of comments! You will see many of my commands that begin with the pound sign (#). They’re also in green. While recommended, you don’t actually have to type these comments.

Page 2: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 2 of 21

1. Keep a History of your R Work (R Script and R Markdown Files)

What is an R Script File”? Short answer: It is a record of your R commands (no output, however). Yes, Stata users; this is equivalent to the Stata do file.. Tip! – Consider maintaining an R script file as you do your work in R. Later, when you are more savvy, use R Markdown. We’ll learn how to do this in a subsequent lab. Suggestion for Users who are Very New to R Studio

__1. Save your R commands : Create and maintain an R Script file of the R commands in your session; and __2. Create a log of your entire session (commands and output): Upon completion of your R session, from the console window (lower left), do a “select-all” and then copy your session into a Word document elsewhere for editing, etc.

Suggestion for Users Familiar with R Studio Use R Markdown! Stay tuned. We’ll learn R Markdown in our 2nd lab session.

This introductory lab session assumes that you are very new to R Studio.

Page 3: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 3 of 21

How to Open a New R Script File

Step 1. Launch R Studio Step 2. From the top menu bar: FILE > NEW FILE > R SCRIPT Step 3. Save this new and empty R script file to a name of your choosing: FILE > SAVE AS

Example – Rlabsession1.R Tip – Take care to note where you are saving!

How to Work with Your R Script File (Cycle: “script” and “run”, “script and run”)

Step 1. With your cursor in the R Script file, type your command or set of commands (“script”) Step 2. To execute your commands, at right, click on the RUN icon (“run”) Cycle …

Example – Rlabsession1.R Tip – This way, I can execute blocks of several R commands all at once.

Page 4: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 4 of 21

2. Enter Data Directly into R

R is all about “objects”. Data in R is, thus, an “object” Each command is an assignment to an object. “objectname” <- Type here the stuff that you are assigning to “objectname” Direct Entry of Data for a Single Variable using the command c(value, value, value, etc)

# Enter data on weight for n=4 into column vector called weight # objectname <- c(value,value,value,value) weight <- c(161.3,120.1,223.2,124.0) # List (echo) # objectname weight [1] 161.3 120.1 223.2 124.0

Direct Entry of Data for Multiple Variables using c(value, value, etc) and data.frame (column,column, etc)

# Enter data on weight for n=4 into column vector called weight # Repeat for age # Repeat for study id weight <- c(161.3,120.1,223.2,124.0) age <- c(54,36,78,62) studyid <- c(1,2,3,4) # “Bind” columns of data into a single matrix # objectname <- data.frame(column,column,colum) labdata1 <- data.frame(studyid,age,weight) # List (echo) # objectname labdata1 studyid age weight 1 1 54 161.3 2 2 36 120.1 3 3 78 223.2 4 4 62 124.0

Page 5: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 5 of 21

3. Import Excel Data (“.xlsx”) into R

Importing Excel Data into R Requires 2 or 3 Steps

First. (This is a “one time” step) Install the package openxlsx Second. Attach (load) the package to your R Environment Third. Import Excel data

1. How to Install the Package openxlsx install.packages(“openxlsx”)

Example - Here is a “screen capture’ of my installation of openxlsx

Page 6: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 6 of 21

2. Attach (Load) the package openxlsx to your R Environment library(openxlsx)

3. Import Excel Data using function read.xlsx( )

# Import Excel data (R_lab1.xlsx) into R object (excel_import) # objectname <- read.xlsx(“full path/R_lab1.xlsx”) excel_import <- read.xlsx(“/Users/cbigelow/Desktop/R_lab1.xlsx”) # List (echo) # objectname excel_import id dob gender weight 1 1 46107 male 161.3 2 2 20615 female 120.1 3 3 19815 male 223.2 4 4 18936 female 124.0

Page 7: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 7 of 21

4. Import Stata Data (“.dta”) into R

No surprises here. The steps are similar to those for importing Excel data into R. Importing Stata Data into R Requires 2 or 3 Steps

First. (Again, this is a “one time” step) Install the package foreign Second. Attach (load) the package to your R Environment Third. Import Stata data

1. How to Install the Package foreign install.packages(“foreign”)

2. Attach (Load) the package foreign to your R Environment library(foreign)

4a. Import Stata Data From a Folder on Your Computer using function read.dta( )

# Import Stata data (larvae.dta) into R object (larvae_data) # objectname <- read.dta(“full path/larvae.dta”) larvae_data <- read.dta(“/Users/cbigelow/Desktop/larvae.dta”) # List (echo) # objectname larvae_data id y x1 x2 1 1 2.836 0.150 0.425 2 2 2.966 0.214 0.439 3 3 2.687 0.487 0.301 --- some rows skipped --- 13 13 2.385 0.942 0.141 14 14 2.452 1.090 0.289 15 15 2.351 1.194 0.193

4b. Import Stata Data From the Internet using function read.dta( )

# Import Stata data (week02.dta) into R object (week02_data) # url <- “full url” url <- “http://people.umass.edu/biep640w/datasets/week02.dta” week02_data <- read.dta(file=url)

Page 8: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 8 of 21

5. Describe Your Data Set Structure

# Observations, # Variables

R Command Display variable names # Example

colnames(larvae_data)

Display number of rows (observations) nrow( ) # Example nrow(larvae_data) [1] 15

Display number of columns (variables) ncol( ) # Example ncol(larvae_data) [1] 4

Display dimensions of data set frame (rows x columns) (# observations x # variables)

dim( ) # Example dim(larvae_data) [1] 15 4

Detailed Structure of Data command str(dataframe)

# str(dataframe) str(larvae_data) 'data.frame': 15 obs. of 4 variables: $ id: num 1 2 3 4 5 6 7 8 9 10 ... $ y : num 2.84 2.97 2.69 2.68 2.83 ... $ x1: num 0.15 0.214 0.487 0.509 0.57 ... $ x2: num 0.425 0.439 0.301 0.325 0.371 ... - attr(*, "datalabel")= chr "PubHlth 640 Unit 2 Regression - Larvae data" - attr(*, "time.stamp")= chr "10 Feb 2013 16:36" - attr(*, "formats")= chr "%9.0g" "%9.0g" "%9.0g" "%9.0g" - attr(*, "types")= int 254 254 254 254 - attr(*, "val.labels")= chr "" "" "" "" - attr(*, "var.labels")= chr "larva id" "log10(survival)" "log10(dose)" "log10(weight)" - attr(*, "expansion.fields")=List of 2 ..$ : chr "_dta" "note1" "\"Week 3 homework assignment exercises 2 and 3\"" ..$ : chr "_dta" "note0" "1" - attr(*, "version")= int 12

Page 9: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 9 of 21

6. List Your Data

Use Subscripts to Select a Subset of Your Data Just like spreadsheets in Excel or Stata data sets, rows are observations and variables are columns To select a subset of your data requires using subscripts, which can be thought of as addresses. Subscripts appear in square brackets and are of the form: dataframe[1st subscript is for rows, 2nd subscript is for columns] = [observation selection, variable selection]

R Command List First 6 observations using head( )

# Example head(larvae_data)

List Last 6 observations using tail( )

# Example tail(larvae_data)

List Entire Data Set (be careful!): Two ways: 1) Just type its name; or 2) View(dataframe).

# Example larvae_data # Example View(larvae_data)

List Selected Observations: For observations 8, 9, 10, list the data for all variables

# Example larvae_data[8:10,] id y x1 x2 8 8 2.602 0.781 0.406 9 9 2.556 0.739 0.364 10 10 2.441 0.832 0.156

List Selected Variables: For all observations, List the observations for only the 2nd variable

# Example larvae_data[,2] [1] 2.836 2.966 2.687 2.679 2.827 2.442 2.421 2.602 2.556 2.441 2.420 2.439 2.385 2.452 2.351

List Selected Variables on Selected Observations: For observations 8, 9, 10 only, List the observations for only the 2nd variable

# Example larvae_data[8:10,2] [1] 2.602 2.556 2.441

Page 10: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 10 of 21

7. Describe Your Data - Numerical Descriptions

# We will use some packages install.packages("gmodels") install.packages("psych") install.packages("mosaic") install.packages("readr") library(gmodels) library(psych) library(mosaic) library(readr) # Load from the internet the dataset ivf.dta library(foreign) url <- "http://www.pauldickman.com/survival/ivf.dta" ivfdata <- read.dta(file=url) # Quick check of dataset structure using command str() str(ivfdata) 'data.frame': 641 obs. of 6 variables: $ id : num 1 2 3 4 5 6 7 8 9 10 ... $ matage : int 33 34 34 30 35 37 31 31 33 33 ... $ hyp : int 0 0 0 0 0 0 0 1 1 0 ... $ gestwks: num 37.7 39.2 35.7 39.3 38.4 ... $ sex : Factor w/ 2 levels "male","female": 2 2 2 1 2 1 1 2 1 2 ... $ bweight: int 2410 2977 2100 3270 2620 3260 3750 1450 3200 3675 ... - attr(*, "datalabel")= chr "In Vitro Fertilization data" - attr(*, "time.stamp")= chr "27 Aug 2001 13:11" - attr(*, "formats")= chr "%9.0g" "%8.0g" "%8.0g" "%9.0g" ... - attr(*, "types")= int 102 98 98 102 98 105 - attr(*, "val.labels")= chr "" "" "" "" ... - attr(*, "var.labels")= chr "identity number" "maternal age (years)" "hypertension (1=yes, 0=no)" "gestational age (weeks)" ... - attr(*, "version")= int 7 - attr(*, "label.table")=List of 1 ..$ sex: Named int 1 2 .. ..- attr(*, "names")= chr "male" "female"

Page 11: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 11 of 21

# ONE discrete variable: sex options(digits=2) # Obtain frequencies, FREQ FREQ <- table(ivfdata$sex) # Obtain cumulative frequencies, CUMFREQ CUMFREQ <- cumsum(FREQ) # Obtain relative frequencies, RELFREQ RELFREQ <- FREQ/length(ivfdata$sex) # Obtain cumulative relative frequencies CUMRELFREQ CUMRELFREQ <- cumsum(RELFREQ) # Create table for descriptives, TABLE_sex TABLE_sex <- cbind(FREQ,RELFREQ,CUMFREQ,CUMRELFREQ) colnames(TABLE_sex) <- c("Freq", "Rel Freq","Cum Freq", "Cum Rel Freq") # Display desriptives TABLE_sex Freq Rel Freq Cum Freq Cum Rel Freq male 326 0.51 326 0.51 female 315 0.49 641 1.00 # TWO discrete variables (sex and hyp)

# Obtain frequencies table(ivfdata$sex) table(ivfdata$hyp) sexhyp <-table(ivfdata$sex,ivfdata$hyp) # Obtain row percents prop.table(sexhyp,1) 0 1 male 0.84 0.16 female 0.88 0.12 # Obtain column percents prop.table(sexhyp,2) 0 1 male 0.50 0.58 female 0.50 0.42

Page 12: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 12 of 21

# Obtain total percents prop.table(sexhyp) 0 1 male 0.427 0.081 female 0.433 0.058 # TWO discrete variables (sex and hyp) with xtab using # command CrossTable() from package gmodels

# CrossTable(rowvariable,columnvariable, digits=2, expected=TRUE,dnn=c("rowvariablename", "columnvariablename")) CrossTable(ivfdata$sex,ivfdata$hyp, digits=2, expected=TRUE,dnn=c("Sex", "Hypertension")) Cell Contents |-------------------------| | N | | Expected N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 639 | Hypertension Sex | 0 | 1 | Row Total | -------------|-----------|-----------|-----------| male | 273 | 52 | 325 | | 279.73 | 45.27 | | | 0.16 | 1.00 | | | 0.84 | 0.16 | 0.51 | | 0.50 | 0.58 | | | 0.43 | 0.08 | | -------------|-----------|-----------|-----------| female | 277 | 37 | 314 | | 270.27 | 43.73 | | | 0.17 | 1.04 | | | 0.88 | 0.12 | 0.49 | | 0.50 | 0.42 | | | 0.43 | 0.06 | | -------------|-----------|-----------|-----------| Column Total | 550 | 89 | 639 | | 0.86 | 0.14 | | -------------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 2.4 d.f. = 1 p = 0.12 Pearson's Chi-squared test with Yates' continuity correction ------------------------------------------------------------ Chi^2 = 2 d.f. = 1 p = 0.15

Page 13: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 13 of 21

# ONE continuous variable (bweight) summary(ivfdata$bweight) Min. 1st Qu. Median Mean 3rd Qu. Max. 630 2850 3200 3130 3550 4650 favstats(ivfdata$bweight) min Q1 median Q3 max mean sd n missing 630 2850 3200 3550 4650 3129 653 641 0 # ONE continuous variable (bweight), over groups (sex) # using command describeBy() from package psych describeBy(ivfdata$bweight,ivfdata$sex) group: male vars n mean sd median trimmed mad min max range skew kurtosis se 1 1 326 3211 666 3290 3256 526 700 4650 3950 -0.88 1.6 37 --------------------------------------------------------------------------------------------------------- group: female vars n mean sd median trimmed mad min max range skew kurtosis se 1 1 315 3044 629 3120 3108 445 630 4416 3786 -1.1 2 35

Page 14: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 14 of 21

8. Describe Your Data – Graphical Descriptions

# ONE discrete variable (hyp) - bar graph # Create “object” (freqhyp) containing the frequencies to be plotted freqhyp <- table(ivfdata$hyp) # Bar Graph of frequencies with option main=”title” barplot(freqhyp,main="Bar Chart of hyp: Hypertension")

Page 15: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 15 of 21

# ONE Continuous Variable (matage) – Box Plot boxplot(ivfdata$matage, main="Box Plot of matage: Maternal Age")

# ONE Continuous Variable (matage), over groups (sex) – Box Plot # boxplot(continuousvariable~groupingvariable,options) boxplot(ivfdata$matage~ivfdata$sex,main="Maternal Age, by Sex")

Page 16: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 16 of 21

# ONE Continuous Variable (bweight) - Histogram hist(ivfdata$matage, main="Histogram of matage: Maternal Age")

Page 17: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 17 of 21

# XY Scatterplot Y=gestwks X=matage # plot(xvariable,yvariable) plot(ivfdata$matage,ivfdata$gestwks,main="Scatterplot",xlab="Maternal Age(yrs)", ylab="Weeks Gestation")

Page 18: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 18 of 21

9. One and Two Sample Inference

# ONE CONTINUOUS VARIABLE - One sample t-test of gestwks (NULL: mu=40) t.test(ivfdata$gestwks,alternative="two.sided",mu=40) One Sample t-test data: ivfdata$gestwks t = -14, df = 640, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 40 95 percent confidence interval: 39 39 sample estimates: mean of x 39 t.test(ivfdata$gestwks,alternative="greater",mu=40) One Sample t-test data: ivfdata$gestwks t = -14, df = 640, p-value = 1 alternative hypothesis: true mean is greater than 40 95 percent confidence interval: 39 Inf sample estimates: mean of x 39 t.test(ivfdata$gestwks,alternative="less",mu=40) One Sample t-test data: ivfdata$gestwks t = -14, df = 640, p-value < 2.2e-16 alternative hypothesis: true mean is less than 40 95 percent confidence interval: -Inf 39 sample estimates: mean of x 39

Page 19: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 19 of 21

# ONE CONTINUOUS VARIABLE, 2 groups: Two (independent groups) Sample t-test t.test(ivfdata$gestwks~ivfdata$sex) Welch Two Sample t-test data: ivfdata$gestwks by ivfdata$sex t = 0.14, df = 627, p-value = 0.8864 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.34 0.39 sample estimates: mean in group male mean in group female 39 39 # ONE 0/1 DISCRETE VARIABLE: TEST of Proportion # (Null: proportion of sex=2 "female" babies=.5) options(digits=2) # Obtain frequencies, FREQ FREQ <- table(ivfdata$sex) # Obtain cumulative frequencies, CUMFREQ RELFREQ <- FREQ/length(ivfdata$sex) # Create table to display with test TABLE_test <- cbind(FREQ,RELFREQ) colnames(TABLE_test) <- c("#", "%") TABLE_test # % male 326 0.51 female 315 0.49 # prop.test(#events,#trials,null p) prop.test(315,641,.5) 1-sample proportions test with continuity correction data: 315 out of 641 X-squared = 0.16, df = 1, p-value = 0.6929 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.45 0.53 sample estimates: p 0.49

Page 20: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 20 of 21

# prop.test(#events,#trials,null p) WITHOUT continuity correction prop.test(315,641,.5, correct=FALSE) 1-sample proportions test without continuity correction data: 315 out of 641 X-squared = 0.19, df = 1, p-value = 0.6639 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.45 0.53 sample estimates: p 0.49 # TWO 0/1 Discrete Variables: 2 Sample Test of Equality of Proportions hypsex <-table(ivfdata$hyp,ivfdata$sex) hypsex male female 0 273 277 1 52 37 chisq.test(hypsex) # More Detailed TWO Sample Test of Equality of Proportions # using command CrossTable() in package gmodels CrossTable(ivfdata$hyp,ivfdata$sex, digits=2, expected=TRUE,dnn=c("Hypertension", "Sex")) Cell Contents |-------------------------| | N | | Expected N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------|

Page 21: R session 1 Spring 2016 by carol - UMasspeople.umass.edu/biep640w/pdf/R session 1 Spring 2016 by carol.pdf · R and RStudio Lab Session 1 – February 2016 green = # comment black

R and RStudio Lab Session 1 – February 2016 green = # comment black = command blue = output

(mac) 1. Teaching\biep640w\2016\0. docu\4. R docs\R session 1 Spring 2016 by carol.docx Page 21 of 21

Total Observations in Table: 639 | Sex Hypertension | male | female | Row Total | -------------|-----------|-----------|-----------| 0 | 273 | 277 | 550 | | 279.73 | 270.27 | | | 0.16 | 0.17 | | | 0.50 | 0.50 | 0.86 | | 0.84 | 0.88 | | | 0.43 | 0.43 | | -------------|-----------|-----------|-----------| 1 | 52 | 37 | 89 | | 45.27 | 43.73 | | | 1.00 | 1.04 | | | 0.58 | 0.42 | 0.14 | | 0.16 | 0.12 | | | 0.08 | 0.06 | | -------------|-----------|-----------|-----------| Column Total | 325 | 314 | 639 | | 0.51 | 0.49 | | -------------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 2.4 d.f. = 1 p = 0.12 Pearson's Chi-squared test with Yates' continuity correction ------------------------------------------------------------ Chi^2 = 2 d.f. = 1 p = 0.15 chisq.test(hypsex, correct=FALSE) Pearson's Chi-squared test data: hypsex X-squared = 2.4, df = 1, p-value = 0.1238