education.uiowa.edu · web viewthis course is designed for users with some spss experience. the...

Statistics Outreach Center Short Course

SPSS ANOVA/RegressionWednesday, February 19, 2014

6:00 – 8:00 pmN106 LC

Topics Covered:

Analysis of Variance

o One-Way ANOVA

o Two-way ANOVA

o ANCOVA

o MANOVA

Regression Analysis

o Regression

o Logistic Regression

Data Management Syntax

Syntax for Common Analyses

Helpful Links

Overview

This course is designed for users with some SPSS experience. The first sections introduce users to ANOVA and Regression analyses. The remaining sections describe some data management issues, commonly used inferential statistics syntax, and other related topics. During this tutorial, a sample dataset, Employee data.sav, is used for all examples. This example dataset can be downloaded from the webpage of short course at Statistical Outreach Center (http://www.education.uiowa.edu/centers/soc/shortcourses.aspx).

Getting Started

To open SPSS, go to the Start icon on your Windows computer. You should find SPSS under the Programs menu item. SPSS is not actually on these computers, we are accessing SPSS through the Virtual Desktop (for more info go to http://helpdesk.its.uiowa.edu/virtualdesktop).

If SPSS isn’t listed under programs, you may need to access it through the Virtual Desktop website (This site can be found at: https://virtualdesktop.uiowa.edu/Citrix/VirtualDesktop/auth/login.aspx)

When using the Virtual Desktop to access SPSS, you can only open and save files from your University of Iowa personal drive (the H: drive) or from a data source (e.g., flash drive) you have connected prior to opening SPSS.

When using the Virtual Desktop, a dialog box may appear asking for read/write access. If you want to use and save files, you need to agree to give CITRIX full access.

When SPSS opens, it will present you with a “What would you like to do?” dialog box.

For now, click the Cancel button.

https://virtualdesktop.uiowa.edu/Citrix/VirtualDesktop/auth/login.aspx

http://helpdesk.its.uiowa.edu/virtualdesktop

http://www.education.uiowa.edu/centers/soc/shortcourses.aspx

Section 1: Analysis of Variance

1.1. One-Way ANOVA

Comparing group differences for one or more independent and dependent variables in SPSS. For this section, if you have one categorical independent variable and an interval dependent variable the One-Way ANOVA procedure is appropriate.

Analyze > Compare Means > One-Way ANOVA...

One-Way ANOVA: Used to test if the population means of two or more groups are equal.

H0: μ1 = μ2 = μ3 = … = μk H1: At least one μi ≠ μj

Example: Does the population mean for current salary differ by employment category?

To conduct the one-way ANOVA, first select the independent and dependent variables to produce the following dialog box:

The above options will produce the following output (some output is omitted):

Test of Homogeneity of Variances

Current Salary

59.733 2 471 .000

LeveneStatistic df1 df2 Sig.

ANOVA

Current Salary

8.9E+010 2 4.472E+010 434.481 .0004.8E+010 471 102925714.51.4E+011 473

Between GroupsWithin GroupsTotal

Sum ofSquares df Mean Square F Sig.

In the above example in which the hypothesis is that three categories of employment do not differ in their salaries, the F statistic has a value of 434.481 with the associated significance level of .000 (Technically, the p-value is less than 0.001). The significance level tells us that the hypothesis of no difference among three groups is rejected under the .05 significance level. Accordingly, we conclude that the three groups of employment (Clerical, Custodial, and Manager) differ in their salaries.

In order to know which pairs of means differed significantly, we would need to request follow-up tests using the Post Hoc option. Click on the Post Hoc option then select the post hoc test or tests of interest. We will use the Tukey’s Least Significant Difference (LSD)

The results indicate that people in job category 3 (Manager) are paid significantly more than people in job categories 1 and 2 (Clerical and Custodian) and there was not a significant difference between categories 1 and 2.

Multiple Comparisons

salary

LSD

(I) jobcat (J) jobcat

Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

1 2 $-3,100.349 $2,023.760 .126 $-7,077.06 $876.37

3 $-36,139.258* $1,228.352 .000 $-38,552.99 $-33,725.53

2 1 $3,100.349 $2,023.760 .126 $-876.37 $7,077.06

3 $-33,038.909* $2,244.409 .000 $-37,449.20 $-28,628.62

3 1 $36,139.258* $1,228.352 .000 $33,725.53 $38,552.99

2 $33,038.909* $2,244.409 .000 $28,628.62 $37,449.20

*. The mean difference is significant at the 0.05 level.

1.2. Two-Way ANOVA

When there is more than one independent variable, the analysis is done by selecting General Linear Model (GLM) procedures in the Analyze menu. If the analysis involves independent groups and one dependent variable, choose

Analyze > General Linear Model > Univariate...

Example: Is current salary dependent on minority and employment category?

For this example, the Dependent Variable is Current Salary (salary) and the Fixed Factors are Minority (minority) and Employment Category (jobcat).

You can plot the means in order to get a visual understanding of the results. If you select plots, the screen will appear as follows. Add jobcat to the Horizontal Axis box and add minority to the Separate Lines box.

Between-Subjects Factors

Clerical 363Custodial 27Manager 84No 370Yes 104

123

EmploymentCategory

01

Minority Classification

Value Label N

Tests of Between-Subjects Effects

Dependent Variable: Current Salary

9.034E+010a 5 1.807E+010 177.742 .0001.537E+011 1 1.537E+011 1511.773 .0002.596E+010 2 1.298E+010 127.699 .000237964814 1 237964814.4 2.341 .127788578413 2 394289206.5 3.879 .021

4.757E+010 468 101655279.96.995E+011 4741.379E+011 473

SourceCorrected ModelInterceptjobcatminorityjobcat * minorityErrorTotalCorrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .655 (Adjusted R Squared = .651)a.

ManagerCustodialClerical

Employment Category

$80,000

$70,000

$60,000

$50,000

$40,000

$30,000

$20,000

Estim

ated

Mar

gina

l Mea

ns YesNo

Minority Classification

Estimated Marginal Means of Current Salary

__

Assuming alpha = .05, the jobcat main effect and the jobcat by minority interaction are significant. The change in the simple main effect of one variable over levels of the other is most easily seen in the graph of the interaction. If the lines describing the simple main effects are not parallel, then a possibility of an interaction exists. The presence of an interaction was confirmed by the significant interaction in the summary table.

1.3. ANCOVA

ANCOVA (analysis of covariance) is an extension of ANOVA. Examines whether group means (categorical independent variable) differ on a dependent variable after statistically control for another continuous variables (covariate). The analysis is done by selecting General Linear Model (GLM) procedures in the Analyze menu. If the analysis involves independent groups choose

Analyze > General Linear Model >Univariate...

Example: Does salary differ for males and females after controlling for previous experience?

For this example, the Dependent Variable is Current Salary (salary) and the Fixed Factor is Gender (gender) and the covariate is Previous Experience (prevexp).

Under options we can display adjusted means for group which in this case is gender.

Note that if you have more than two groups you can compare by selecting Contrasts as opposed to post hoc analyses.

The output above shows there is a significant difference in salary between males and females after controlling for previous experience, F(1, 471) = 137.020. p<.001. The second table gives the adjusted means in salary for each group based on the covariate. If we compare to the descriptive statistics we see the means have slightly changed but are still significantly different.

1.4. MANOVA

MANOVA (multivariate analysis of variance) is an extension of ANOVA except there are two or more dependent variables with one categorical independent variable. The analysis is done by selecting General Linear Model (GLM) procedures in the Analyze menu. If the analysis involves independent groups choose

Analyze > General Linear Model >Multivariate...

Example: We want to know whether groups differ on a grouping of variables. In this case do the three different job categories differ on job characteristics (salary, beginning salary, and jobtime). These three variables are our dependent variables and jobcat is the fixed factor.

Results: Wilks’ Lambda indicates there is a significant difference in job characteristics based on job category F(6, 938) = 117.402, p<.001, Wilks’ Λ = .326

The table labeled tests of between-subjects effects are univariate ANOVAs and therefore an alpha correction (such as Bonferroni) needs to be made. We can see from this table that job category has a significant effect on salary and beginning salary but not on jobtime.

Section 2: Regression Analysis

2.1. Regression

Regression models can be used to predict or explain values on a (dependent) variable based on information from other (independent) variables.

Overall Model Fit ( F -Test) : Used to test if the regression model is “better” than using only the mean of the dependent variable.

H0: Y = β0 H1: Y = β0 + β1X1 + … + βkXk

Test for a Single β k: Used to test if βk differs from zero.

H0: βk = 0 H1: βk ≠ 0

Example: What is the regression model for using “educational level” and “years experience” to predict salary?

Everything we need to create a linear regression model is located in the following menu:

Analyze > Regression > Linear…

The variable we are trying to “predict” is the Current Salary variable, which goes in the Dependent box. Educational Level and Previous Experience go in the Independent(s) box.

There are many options available in the linear regression dialog box. We’ll just look at one, plotting the predicted value against the standardized residual allows you to exam where the errors seem to be random and whether homogeneity of variances appears to be a reasonable assumption.

Model Summary

Model R R Square Adjusted R Square

Std. Error of the

Estimate

1 .664a .441 .439 $12,788.694

a. Predictors: (Constant), Previous Experience (months), Educational Level (years)

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 6.088E10 2 3.044E10 186.132 .000a

Residual 7.703E10 471 1.636E8

Total 1.379E11 473

a. Predictors: (Constant), Previous Experience (months), Educational Level (years)

b. Dependent Variable: Current Salary

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -20978.304 3087.258 -6.795 .000

Educational Level (years) 4020.343 210.650 .679 19.085 .000

Previous Experience (months) 12.071 5.810 .074 2.078 .038

a. Dependent Variable: Current Salary

So, what does the output tell us?

R2 = 0.664, means that 66% of the variance in salary can be “accounted for” by information about educational level and previous experience.

An F-statistic of 186.132 (p-value < 0.001) indicates that a regression model containing educational level and previous experience is “better” than a model without any predictor variables (using the mean salary as an estimate for anyone).

The regression equation is:

Salary = -20978 + 4020 * Education level + 12 * Previous Experience

The t-statistics for each βi is large enough in magnitude to reject the null hypothesis that each βi = 0 when = .05.

The plot doesn’t look very random, we may need to reconsider our analysis. This is common for a variable like salary and indicates that we may want transform the variable or choose a different analysis.

2.2. Logistic Regression

Predicting a binary outcome variable from one or more predictor variables.

Analyze > Regression >Binary Logistic…

Example: Can we predict an individual’s gender based on salary jobtime and previous experience?

Here gender (either male or female) is our binary dependent variable and salary, jobtime, and prevexp are the covariates.

What the output tell us:

The test of the overall model is significant. X² =180.206 , p<.001

Both salary and previous experience are significant predictors of gender. Based on the model in which we predict gender from salary, jobtime, and prevexp we are correctly classifying 75% of individuals. Practically speaking if we wanted to create an efficient prediction model we would not include variables that aren’t significant predictors. Let’s see what happens if we remove jobtime from the model.

Our correct classification percentage is still about 75 based on just salary and prevexp in our model because as we saw previously jobtime is not a significant predictor of gender.

Section 3: Data Management Syntax

Compute missing =nmiss(salary).Execute.

Listwise deletion excludes all cases that have missing values for any variables in the analyses. Pairwise deletion uses all cases that have valid responses for the variables in each particular statistic being calculated. Default in SPSS is pairwise deletion. Add /missing = listwise subcommand to analysis for listwise deletion.

Add value labels gender “m” “Male” “f” “Female”/minority 0 “No” 1 “Yes”/

item1 to item20 1 “Not at all true of me” 7 “Very true of me”.Execute.

Sort cases by prevexp (A).

Compute salchange = salary-salbegin.Execute.

Compute sum.3(i3, i6, i8).Execute.

Recode item1 (1=7) (2=6) (3=5) (4=4) (5=3) (6=2) (7=1) into Ritem1.Execute.

Sort cases by gender.Temporary.Split file by gender.Corr var = salary educ.

Reliability var = item1 to item20/scale(SC) = all/statistics descriptive scale corr cov/summary = total.

Section 4: Syntax for Common AnalysesFrequenciesFreq var = educ/statistics/percentiles = 25 75/format = notable.

DescriptivesDesc var = salary/statistics = mean stddev/sort = mean (D)/save.

Chi-squareCrosstabs tables = salary by gender/statistics = chisq phi/cells = count sresid expected.

T-testt-test/groups = gender (m f)/variables = salary.

Correlationcorr var = educ salary/missing = listwise.

Regressionreg/dependent salary/ method = enter educ jobtime.

ANOVAglm salary by jobcat/posthoc = jobcat (tukey)/emmeans = tables (jobcat).

Graphs

Graph/bar = jobcat/title = ‘Frequencies of Different Job Categories’.

Graph/bar(grouped) = jobcat by gender/title = ‘Gender Differences in Job Categories’.

Graph/line = educ by jobcat/title = ‘Distribution of educ by job category’.

Graph/scatterplot = salary with educ.

Graph/histogram(normal) = salary.

Section 5: Helpful LinksPowerpoint of various statistical analysis in SPSS:What statistical analysis should I use?

Website of annotated analysis and code/syntax for various analyses in SPSS, Stata, SAS, and Mplushttp://www.ats.ucla.edu/stat/AnnotatedOutput/

http://www.ats.ucla.edu/stat/AnnotatedOutput/

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDkQFjAC&url=http%3A%2F%2Fwww.staff.ncl.ac.uk%2Fmike.cox%2FIII%2Fspss13.ppt&ei=NrMDU6KqLYaayQG604GQDw&usg=AFQjCNEfT7BU9HYoBG-VFtBDoEmNiLDB-w&sig2=I_jZBdMUsdS71eQHbMGyZA&bvm=bv.61535280,d.aWc

education.uiowa.edu · web viewthis course is designed for users with some spss experience. the...

Documents