18 august 20151 statistical analysis with r questionnaires variables organization descriptive...

63
18 August 2015 1 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Upload: howard-little

Post on 30-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 1

Statistical Analysis with R

Questionnaires Variables organization Descriptive analysis Graphs Statistical tests

1

Page 2: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 2

R

Statistical package 4th generation programming language

extensible through functions and extensions environment for statistical computing and

graphics statistical and graphical techniques

extensible through packages

Competitors: SPSS, Matlab

2

Page 3: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Variables

18 August 2015 3

Scale or numeric variables time, age, weight, distance in Kilometers,

length, number of children, GDP Nominal or categorical variables

country of residence, sex, degree course Ordinal variables

education level, rankings, Likert scale in statistical analysis are often considered

as nominal or scale variables Questionnaire overview

Page 4: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Missing values

18 August 2015 4

NA: means "not available", are inserted manually by you whenever datum is missing

NaN: means "not a number", whenever calculation cannot be done for this datum

Are skipped in any statistical analysis Any math operation with them gives NaN

4

Page 5: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Portable R

18 August 2015 5

Portable R Download from my website already

preconfigured or download from http://rportable.sourceforge.net

Uncompress it on your computer’s hard disk or on an USB pendrive

or install R on your computer Download from www.r-project.org Install it on your computer Try desperately to set the language to English

5

Page 6: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Installing packages

18 August 2015 6

To install R commander Packages Install Package(s)... CRAN

Mirror Rcmdr wait for installation of Rcmdr and additional

packages To load R commander

Packages Load Package... Rcmdr to warning on missing packages answer Yes answer to download them from CRAN

Learn to load an R package6

Page 7: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Running R commander

18 August 2015 7

Whenever you want to run it Packages Load Package... Rcmdr File Change Working directory

R commander has problems navigating through your directories’ tree

Choose an easy-to-find directory, such as your Desktop or the place where you keep your R exercises.

7

Page 8: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Files to save

18 August 2015 8

R commander windows script, contains the written instructions R commander File Save Script as… output, contains the output R commander File Save Output as… pasting them into a text file

Workspace contains the data structure File Save Workspace… R commander File Save R workspace As… File Load Workspace…

8

Page 9: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

data.frame or dataset

18 August 2015 9

database table suited for statistical analysis case names are optional

9

Page 10: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Building a new dataset

18 August 2015 10

R commander Data New data set… Insert all variables first Only after insert data and build a codebook

use numbers for nominal and ordinal variables Convert nominal and ordinal variables to

factor R commander Data Manage variables in

active data set Convert numeric variables to factor

Convert ordinal variables to ordered Submit the 3 lines of code with ordered instead of

factor ls.str() and str(dataset)

10

Page 11: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Importing dataset

18 August 2015 11

R commander Data import from a package

Data in packages import from a text file

Import Data from text file, clipboard or URL… import from Excel (hoping that it works )

Import Data from Excel, Access or dBase data set… export to a text file

Active data set Export active data set… 11

Page 12: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Importing dataset from SPSS

18 August 2015 12

written here just in case you'll ever need it; better and easier converting to text file!

R commander Data Import Data from SPSS data set… Pay attention to value labels and

factors date importing is wrong! Fix it with

library(chron) var <- as.chron(ISOdate(1582, 10, 14) +

var) 12

Page 13: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Univariate descriptive analysis

18 August 2015 13

Statistics Summaries For scale variables

Numerical summaries For ordinal and nominal variables

Frequency distributions

13

Page 14: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one nominal variable Column plot

18 August 2015 14

Page 15: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one nominal variable Pie chart Radar graph

18 August 2015 15

Mon

Tue

Wed

ThuFri

Sat

Sun

0

500

Page 16: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one nominal variable Bar plot Line plot

18 August 2015 16

Apr May Jun Jul Aug Sep0.0%

0.2%

0.4%

0.6%

0.8%

1.0%

1.2%

1.4%

JP

US

EPO

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Page 17: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one nominal variable Area plot 3D variants

18 August 2015 17

Apr May Jun Jul Aug Sep0.0%

0.2%

0.4%

0.6%

0.8%

1.0%

1.2%

1.4%

Page 18: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one nominal variable

18 August 2015 18

R commander Graphs Color palette… Bar graph… Pie chart…

To change colors, add option col=c(number of colors from palette) to text command, select text command and submit it

18

Page 19: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one scale variable

Building an histogram grouping into bins

18 August 2015 19

$1,000 $2,000 $3,000 $4,000 $5,000

0

4

8

12

Page 20: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one scale variable

Choosing the bins carefully

18 August 2015 20

$1,000 $2,000 $3,000 $4,000 $5,000

0

10

20

30

$1,000 $2,000 $3,000 $4,000 $5,000

2

4

6

8

Page 21: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one scale variable Boxplot

Median in black line Central 50% is in the

rectangle Central 90% is

between whiskers Extremes are

symbols

18 August 2015 21

Page 22: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

One scale variable case by case Only for scale variable with few

cases Use any appropriate nominal

variable graph

18 August 2015 22

Page 23: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for one scale variable

18 August 2015 23

R commander Graphs Histogram… Boxplot… Index plot…

23

Page 24: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Bivariate analysis: nominal vs nominal

18 August 2015 24

Statistics Contingency table Two-way

table… Percentages Understand clearly when using row

percentages and column percentages

24

Page 25: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for nominal vs nominal Side by side Stacked

18 August 2015 25

Enterntainment Games Lifestyle News Social networking0

2

4

6

8

10

12

14

16

18

20

iPhone Android

Page 26: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for nominal vs nominal Appropriate 3D variants

18 August 2015 26

Ente

rnta

inm

ent

Games

Lifes

tyle

News

Socia

l net

wor

king

02468

101214161820

iPhoneAndroid

Page 27: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for nominal vs nominal a rare example of a useful stacked area

chart

18 August 2015 27

Page 28: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for nominal vs nominal

18 August 2015 28

No available graph in R as far as I know

How to export your graphics into Word right-click copy as bitmap

28

Page 29: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Bivariate analysis: scale vs nominal

18 August 2015 29

Statistics Summaries Numerical summaries

Summarize by groups… Table of statistics…

29

Page 30: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for scale vs nominal Boxplot side by

side

Histogram one above the other

18 August 2015 30

Page 31: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for two variables

18 August 2015 31

R commander Graphs Boxplot… Plot by groups…

31

Page 32: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Bivariate analysis: scale vs scale

18 August 2015 32

Statistics Summaries Correlation matrix

Pearson linear correlation

Spearman rank correlation

32

Page 33: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Scale versus scale Scatterplot

18 August 2015 33

Page 34: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Scale versus scale Mathematical

graph

Regression line

18 August 2015 34

Page 35: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for two variables

18 August 2015 35

R commander Graphs Scatterplot…

Remove all the unnecessary options Line graph… (mathematical graph)

X variable must have values in order

35

Page 36: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Multivariate analysis

18 August 2015 36

Statistics three nominal

Contingency table Multi-way table

three scale Summaries Correlation matrix

36

Page 37: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for three scale variables

Surface plot

18 August 2015 37

Page 38: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for three scale variables

Bubble chart www.gapminder.org

18 August 2015 38

Page 39: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Graphs for two scale and one nominal variables

18 August 2015 39

R commander Graphs Scatterplot… Plot by groups…

39

Page 40: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Restrict data set

18 August 2015 40

R commander Data Active Data Set

Subset active data set… Used to restrict data set to some cases

Use labels and not numbers for nominal variables!

Remove cases with missing data…

40

Page 41: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Recode

18 August 2015 41

Used to create or modify factor/ordered variables

R commander Data Manage variables in active data set Recode variables…

"Bolzano"="here" c("Munich","Hannover",“Bonn") = "Germany“

Do not use "Munich","Hannover",“Bonn" = "Germany” as suggest by help

else= "Others" For numerical variableswe may use also 8:27=

"high" together with lo and hi

Massive recoding 41

Page 42: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Binning

18 August 2015 42

Used to group scale variables into ordered (but it produces factor)

R commander Data Manage variables in active data set Bin numeric variable…

42

Page 43: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Compute

18 August 2015 43

Used to create new variable through math operations

R commander Data Manage variables in active data set Compute new variable…

newvector <- with(dataset, formula) CO2$myname <- with(CO2, uptake*7-sqrt(conc)

) it is identical to

CO2$myname <- CO2$uptake*7-sqrt(CO2$conc)

43

Page 44: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Computing (line command)

18 August 2015 44

Instruction produced by compute CO2$myname <- with(CO2, uptake*7-

sqrt(conc) ) can be easily typed directly by you! Or you can type

CO2$myname <- CO2$uptake*7-sqrt(CO2$conc)

Variables’ names must be preceded by dataset’s name and $

<- means take things from the right and put on the left44

Page 45: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Computing (line command)

18 August 2015 45

If you do not specify dataset$, variable will be created outside the dataset with only one case (unless otherwise specified)

print(variable) to look at it Variable assignment

variable <- value or formula, value or formula -> variable + - * / **

45

Page 46: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

Computing (line command)

18 August 2015 46

Variable with many cases outside dataset is called “vector” vector <- c(list of items) to create it

manually vector[index] to access a specific vector’s

element vector[from:to] to access a sequence of

vector’s elements

46

Page 47: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 47

Statistical tests

Example: we want to study the age of Internet users, checking whether the average age is 35 years or not The only information we have are the

observations on a sample of 100 users, which are: 25; 26; 27; 28; 29; 30; 31; 30; 33; 34; 35; 36; 37; 38; 30; 30; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 20; 54; 55; 56; 57; 20; 20; 20; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35.

Page 48: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 48

Statistical tests Test’s hypotheses:

H0: average age on population is 35 H1: average age on population is not 35

We calculate the age average on the sample, 36.2, which is an estimation for the average population’s age. We compare this result with the 35 of the H0 hypothesis and we find a difference of +1.2.

We ask ourselves whether this difference is: large , implying that the average population’s age is

not 35 and thus H0 must be rejected small and it can be caused by random fluctuation in

the sample choice and therefore H0 must be accepted.

Page 49: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 49

Statistical tests In order to answer, the test provides us

with a significance: probability that H0 is not false In this example significance is 16%

If significance is large, we accept H0

this implies that we do not know If significance is small, we reject H0

this implies that we are almost sure that H0 is false

Significance is also called p-value

Page 50: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 50

Typical univariate analysis techniques

Variables

Numerical description

Graphical descriptio

n

Parametric test

Non-parametric

test

nominal

Frequencies (one-

dimensional contingency

table)

Column plot

Pie chart---

Chi-square for a one-

dimensional contingency

table

scale Descriptive statistics

HistogramBoxplot

Student’s t for one variable

Sign test

Page 51: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 51

Tests for one scale variable Student’s t test for one var

H0: avg on the population = m Statistics Means

Single-sample t-test

Sign test H0: median on the population = m Not available in R commander

Page 52: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 52

Tests for one nominal variable

Chi-square test for a one-dimensional contingency table H0: classification follows a

predetermined distribution Statistics Summaries Frequencies

Distributions… Chi-square

Page 53: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 53

Typical bivariate analysis techniques

VariablesNumerical descriptio

n

Graphical description

Parametric test

Non-parametric

test

nominal vs

nominal

2D contingenc

y table

Clustered or stacked or 3D column plot

---Chi square for a 2D contingency

table

binary nominal vs scale Descriptive

statistics by groups

 

Boxplots or histograms by groups

Student’s t for two populations

Mann-Whitney

non binary

nominal vs scale

One-way analysis of variance (ANOVA)

Kruskal-Wallis

scale vs scale

Person’s or Spearman’s correlation

Scatterplot

Pearson’s correlation

Student’s t for paired data

Spearman’s correlation

Wilcoxon signed rank test

Page 54: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 54

Tests for two nominal variables

Chi-square test for a two-dimensional contingency table H0: classification of two variables is

independent Statistics Contingency table Two-

way table… Statistics Chi-square test of

independence Warning: you should have no expected

frequency less than 5

Page 55: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 55

Test for binary nominal vs scale

Student’s t test for two pop H0: average group 1 =

average group 2 Statistics Means

Independent samples t-test Warning: scale variable should be

normally distributed on two groups

Page 56: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 56

Non-parametric test for binary nominal vs scale

Mann-Whitney Wilcoxon rank-sum

It tests the ranks H0: position group 1 = position group 2 Statistics Nonparametric tests Two-

samples Wilcoxon test

Page 57: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 57

Test for non-binary nominal vs scale

ANOVA (ANalysis Of VAriance) H0: average is the same for all groups Statistics Means One-way ANOVA Test rejects if just one population’s

average is different than the others

Warning: scale variable should be normally distributed for each group

Page 58: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 58

Non-parametric test for non-binary nominal vs scale

Kruskal-Wallis

It tests the ranks H0: position is the same for all groups Statistics Nonparametric tests

Kruskal-Wallis test

Page 59: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 59

Tests for two scale variables Pearson’s and Spearman’s correlation

tests H0: correlation = 0 Statistics Summaries Correlation

test

Page 60: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 60

Tests for difference of two scale variables

When using tests on variables differences

Student’s t test for paired data H0: average (var 1 – var 2) = 0 Statistics Means Paired t test Warning: distribution of difference of

scale variables must be normal

Page 61: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 61

Nonparametric test for two scale paired variables

Wilcoxon signed-rank test It tests the ranks H0: var 1 – var 2 is

positioned around 0 Statistics

Nonparametric tests Paired-samples Wilcoxon test

Page 62: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 62

Is a variable normally distributed?

Histogram with normal curve Find out average a and standard deviation

s Build an histogram with appropriate

binning close it, add prob=TRUE and rebuild it do not close it!

curve(dnorm(x, mean=a, sd=s), col="blue", lwd=2, add=TRUE, yaxt="n")

Q-Q plot (data must be on the line) Graphs Quantile-comparison Plot

Page 63: 18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August 2015 63

Is a variable normally distributed? Skewness

negative: tail left, positive: tail right excess Kurtosis

negative : flat, 0: normal, positive: too pointy

Statistics Summaries Numerical summaries Options

Shapiro-Wilk normality test H0: variable comes from a normal

distribution Statistics Summaries Shapiro-Wilk test of

normality