stats workshop

Upload: gura1999

Post on 05-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Stats Workshop

    1/11

    SPSS Workshop

    Utilizing and implementing SPSS in our OC-Math statistics classes

  • 7/31/2019 Stats Workshop

    2/11

    My Name Is: .

    Starting SPSS, entering and modifying data will be studied in this section.

    Create a folder on the desktop under your name. After this go to

    Start, Programs, Spss, Pasw Statistics 18 . (wait one minute)

    Name variables in SPSS before entering data. Click the Variable View tab (bottom).

    Entername for variable 1, then for variable 2,...

    Advices regarding Variable View

    1. In Variable View, you can NOT enter spaces for the name of variables.

    2. Preferably, enter quantitative data.3. When you enter integer values, select 0 in Decimals4. You can also enterlabels. Here you can type spaces etc. These would be

    the text that will be displayed when the mouse is on that variable. Example:If you enterAgeMarrfor the name of the variable, you can typeAge When

    Got Marriedunderlabels.5. You can assign values to a variable, ex. 1=female, 2=male

    Start Typing orcopyingdata from excel or from another document

    Copying from excel: mark data and do control c then go to SPSS and do control v

    Saving : Click on File, Save, then browse to your folder and type example1.sav

  • 7/31/2019 Stats Workshop

    3/11

    My Name Is: .

    Basic statistics will be studied in this section.

    Mean, Mode, Standard Deviation(go to data window) Analyze, Descriptive Statistics, Frequencies

    Select your variables and them into the Variable(s):pane on the right.

    Click on Statisticsand on the new window (Frequencies Statistics) select

    Mean, Median, Mode and Standard Deviation, then continue

    Click on Charts, Select None.

    Ok

    In the Output Window: go to Left pane: and Click inside Frequency Table, observe.

    Bar, Histogram or Pie Charts(go to data window) Firstly do Analyze, Descriptive Statistics, Frequencies

    Select your variables and them into the Variable(s):pane on the right.

    Click on Charts, Bar Charts and clickcontinue then clickok

    Now, do Analyze, Descriptive Statistics, FrequenciesSelect your variables and them into the Variable(s):pane on the right.

    Histogram, Show Normal Curve on Histogram, clickcontinue then clickok

    Finally do Analyze, Descriptive Statistics, Frequencies

    Charts, Pie Charts, Percentages, Continue, Ok

    In the Output Window: go to Left pane: and Click inside Frequencies, find your charts and

    observe.

  • 7/31/2019 Stats Workshop

    4/11

    My Name Is: .

    Advanced statistics will be studied in this section.

    Relationship: Correlation (go to data window) Analyze, Correlate, Bivariate

    In the Bivariate Correlations dialog window select 2 to 5 variables

    Make sure Pearson, Two Tailed and Flag Signif... are checked, Ok

    Output Window: Left pane: click on Correlations then Correlations

    Relationship:Cross Tabs (go to data window) Analyze, Descriptive Statistics, Crosstabs

    Move variables from left to right, one in columns and one in rows

    SelectStatistics, checkbox Chi-square then continue, then clickok

    Output Window: Left pane almost at bottom : click on Cross Tabs/Chi-Square

    Differences between two groupsRunning the t-testIndependent Groups(Data window)

    Analyze, Compare Means, Independent-Samples T...

    In the Test Variable select one variable, in the Group Variable select another.You are trying to split into two groups to find the differences between these two groups.

  • 7/31/2019 Stats Workshop

    5/11

    My Name Is: .

    Define Groups,Cut Point, (it depends), Continue, then Ok

    (it might be define groups, e.g. 1 for female, 0 for male)

    Output Window: Left pane: click on T-Test, then Independent Samples

    Advanced statistics will be studied in this section.

    Linear Regression(I want to know how salary is related to GPA)

    Analyze, Regression, Linear

    Select dependent and independent variable

    Continue Then Ok.

    Look at Coefficients in the Model box. B column has a top and bottom coefficient.

    Coeeficient a=bottom in column B, Coefficient b=top in Column B.

    Linear Regression: GRAPH(I want to know how salary is related to GPA) y=ax+b

    Graph, (Legacy Dialogs) Scater/Dot, Simple, Define

    Select dependent and independent variable

    Ok, Double clickon the plot, be patient, then Options, then Reference line

    From Coefficients obtained in the Linear Regression, under Custom Equation insert

    y=(coefficient a you got before)*x + (coefficient b you got before),

    Checkbox: Attach label to. Close.

    Also close the Chart Editor by closing the window on the top right corner.

    Linear Regression: ResidualsFind Outliers

    Analyze, Regression, Linear

    Select dependent and independent variable

  • 7/31/2019 Stats Workshop

    6/11

    My Name Is: .

    ClickPlots then choose Zresid for the Y axis and ZPred for the X axis.

    Continue Then Ok.

    Double clickon the plot, then Options, then Reference line FromEquation, Enter

    y=2.5, then Apply , close. Also close the Chart Editor.

    Advanced statistics will be studied in this section.

    Testing Normality Analyze, Descriptive Statistics , Q-Q plots, choose a variable,

    Select test distribution to be normal, then Ok

    Kolmogorov-Smirnov TestTest if Data are Normal or Test Randomness

    Analyze, Nonoparametric Test , One-Sample , (may choose a variable)

    CheckAutomatically CompareObserved data to hypothesized

    (You could have also chosen the TEST SEQUENCE FOR RANDOMNESS to see if the sequence of

    values defined by some cutting point is random)

    If you dont see this AUTOMATICALLY COMP option, select Descriptiveand alsoExclude cases test-by-test.

    Run

    Output Window: Left pane: click on NonParametric Tests, then Model Viewer

    Binomial TestWhen Normality fails, Test if Data are Binomial

    Analyze, Nonoparametric Test , Binomial ,

  • 7/31/2019 Stats Workshop

    7/11

    My Name Is: .

    Select a variable

    Cut Point Enter a value (about the mean)

    Test Proportion Enter a value (about 50%)

    Options, Descriptiveand also Exclude cases test-by-test)

    Ok

    Output Window: Left pane: click on NPar Tests, then Binomial Test

    Advanced statistics will be studied in this section.

    Hypothesis TestingRunning a T-Test: Is the GPA=3.5?

    Analyze, Compare Means, One-Sample T-Test

    In the Test Variable select one variable,

    Enter a value to test in Test Value (typically a mean) then OK

    Output Window: Left pane: click on T-Test, then Independent Samples

    Chi SquareExpecting 10%Hispanic, 10%Asian, 70%White, 10%AfriAmer

    Analyze, Nonoparametric Test , Chi-Square ,

    Select a variable

    Expected Values Enter a value, then Add. Repeat.

    Options, Descriptiveand also Exclude cases test-by-test)

    Ok

    Output Window: Left pane: click on NPar Tests, then Chi-Square Test

  • 7/31/2019 Stats Workshop

    8/11

    My Name Is: .

    One-Way ANOVA(In AgeGender: 1=MaleOld, 2=MaleYong,3=FemaleOld, 4=Female Young)

    (for example Salary is the Factor, AgeGender is in the Dependent List)

    Analyze, Compare Means , One-Way Anova ,

    Select a independent variable (Factor), in the given example AgeGender. Drag a variable

    into the Dependent List, for example Salary.

    Post Hoc Select LSD then Continue.

    Options, Descriptive, Means Plot, Continue, OK

    Output Window: Left pane: click on Oneway, then ANOVA

    Ho=Null= Differences in Score between the groups =0= No significant Differences

    Explanations about how to read the output from SPSS.

    A large standard deviation indicates that the data points are far from

    the mean and a small standard deviation indicates that they are clustered closely

    around the mean. It is a measure of dispersion.

    For the cross tabs, for the Chi-Square value

    A) If Sig < 0.05, then the variables are significantly related.

    B) If Sig > 0.05, then the variables are NOT significantly related.

    For the correlation, the important value is the Pearson (r) value,

    1) if the r values greater than .50 indicate a strong correlation

    2) if the r values around .30 indicatea moderate correlation3) if the r values less than .20 indicate a weak correlation

    For thet-test, independent t-test,

    1)If Sig.< 0.05, use the output in the Equal variances NOT assumed rows.

    2)If Sig.> 0.05, use the output in the Equal variances assumed rows.

    Then we move to the Sig (2-tailed),A) If Sig< 0.05 we conclude that there was significant difference between

    the two means, i.e. that the difference is real and not due to randomness.

    B) If Sig> 0.05 we conclude that there was no significant difference between

    the two means.

    For the t-test, Paired We look at the Sig (2-tailed)

  • 7/31/2019 Stats Workshop

    9/11

    My Name Is: .

    A) If Sig < 0.05 we conclude that there was significant difference between

    the two means, i.e. that the difference is real and not due to randomness.

    B) If Sig> 0.05 we conclude that there was no significant difference betweenthe two means.

    Explanations about how to read the output from SPSS.

    Kolmogorov-Smirnov Test We look at the Sig (2-tailed)A) If Sig < 0.05 we conclude the data are NOT normal.

    B) If Sig) > 0.05 we conclude there is insufficient evidence to reject normality(we then proceed with the assumption the data are normal)

    P - Value We look at the Sig (2-tailed)The p-value is the same as the Asymp. Sig.A) If p-value (or Sig) < 0.05 we reject the null hypothesisB) If p-value (or Sig) >0.05 we conclude there is insufficient evidence to reject

    the null hypothesis (we then proceed with the assumption the null hypothesis is true,

    i.e. we accept it)

    Zero P ValueIf mean1=hypothesized mean (the test value we input in our t-test) and

    mean2=the mean from our sample of size N, then

    a p-value of 0 indicates that there is a zero probability that in a random sample of

    size N from a population with mean1 we, by purely chance, obtain mean 2. If this isthe case, we say that it is unlikely that the mean1 has occurred by chance and it is morelikely that the mean1 is not as hypothesized.

    Hypothesis Testing is all about gathering evidence to suggest the null is nottrue, the lack of such evidence warrants a Do not reject decision. Remember, not

    guilty doesnt mean the person is innocent, it means, the jury did not find enough

    evidence to condemn the person.

  • 7/31/2019 Stats Workshop

    10/11

    My Name Is: .

    Binomial Test When normality fails, we use a weaker test: The Binomial test.We have a cutting point and a proposed proportion or percentage.We look at the Sig (2-tailed)

    A) If Sig < 0.05 we reject the null hypothesis, hence the proposed percentage

    of our variable is NOT above our cutting pointB) If Sig) > 0.05 There is insufficient evidence to conclude that the

    proposed percentage of our variable is not above our cutting point.

    For the Q-Q Plot, if the dots look pretty much like the straight line, we

    conclude the data look close to a normal distribution; otherwise we conclude that

    the data are far from a normal distribution.

    Explanations about how to read the output from SPSS.

    One-Way Anova If p (Sig) 0.05 we can conclude as follows:

    According to the Anova Table, there is insufficient evidence to reject the null, in

    this case, we found not enough evidence to conclude that the there are no

    significant differences in the variable of interest among these groups.

    The linear regression straight line, tells you that it is the best straight

    line that fits the data (it minimizes the sum of the squares of the distances from thepoints to the line).

    You can also save the output by going to the output window and clickSave then browse to your folder and type example1.SPO

    http://wwwstage.valpo.edu/other/dabook/ch8/c8-3.htm

    DATAPLOT (Visual),R (Fav), OPENSTAT (Soc. Sci.),INSTAT (Climat. Data)

    https://webmail.csuci.edu/exchweb/bin/redir.asp?URL=http://wwwstage.valpo.edu/other/dabook/ch8/c8-3.htmhttp://www.itl.nist.gov/div898/software/dataplot/homepage.htmhttp://www.itl.nist.gov/div898/software/dataplot/homepage.htmhttp://www.r-project.org/http://www.statpages.org/miller/openstat/http://www.rdg.ac.uk/ssc/software/instat/instat.htmlhttps://webmail.csuci.edu/exchweb/bin/redir.asp?URL=http://wwwstage.valpo.edu/other/dabook/ch8/c8-3.htmhttp://www.itl.nist.gov/div898/software/dataplot/homepage.htmhttp://www.r-project.org/http://www.statpages.org/miller/openstat/http://www.rdg.ac.uk/ssc/software/instat/instat.html
  • 7/31/2019 Stats Workshop

    11/11

    My Name Is: .

    Linear RegressionNote: It is often very useful to look at the standardized residual versus standardized predicted plot in orderto look for outliers and to check for homogeneity of variance. The ideal situation is to see no observations

    beyond the reference lines, which means that there are no outliers. Also, we would like the points on the

    plot to be distributed randomly, which means that all the systematic variance has been explained by the

    model.

    AnovaAs with the t-test, ANOVA also tests for significant differences between groups. But while the t-test islimited to the comparison of only two groups, one-way ANOVA can be used to test differences in three or

    more groups. The ANOVA procedure produces an Fstatistic, a value whose probability enables the

    researcher to reject or retain the null hypothesis, i.e., to conclude whether or not the differences in the

    scores on the dependent variable are statistically significant or due to chance.

    Two-Ways Anova Factorial ANOVA improves on one-way ANOVA in that theresearcher can simultaneously assess the effects oftwo (or more) independent variables on a singledependent variable within the same analysis. Thus, factorial ANOVA yields the same information that two

    one-way ANOVA's would, but it does so in one analysis. But that's not all. Factorial ANOVA also allows

    the investigator to determine the possible combinedeffects of the independent variables. That is, it also

    assesses the ways in which these variables interactwith one another to influence scores on the dependent

    variable.