ch 1 cebt notes

Upload: amit-bhattacherji

Post on 03-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Ch 1 Cebt Notes

    1/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 1

    Chapter 1: Basic Commands

    This chapter deals with very basic but very useful commands which are frequently used by data

    analyst. These commands will make your task very easy and after learning these commands yourcomfort level with regard to data analysis and management will enhance substantially. Following

    seven commands are discussed in this chapter.

    Tools/Commands:

    1. Find2. Go to case3. Split File4. Select Cases5. Converting Continuous data into Categorical Data6. Replace7. Compute New Variable8. RANGE9. ANY

    Data Files and Documents used in this chapter are as follows:

    1. cs2LR_100813.sav2. grades.sav

    1. Tool Find

    Utility

    Useful in replacing a number and typical utility comes in Chi-square test when 25% cells

    show less than 5 frequencies. You need to merge columns or rows in contingency table.

  • 8/12/2019 Ch 1 Cebt Notes

    2/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 2

    Query 1

    Findcases whose first name starts with L. [file grades.sav]

    Start:Select entire columnfirstnameor keep cursor in any of the cells under column

    firstname.

    Click Find. Write L against Find:and click Begins with. Click Find Next

    Practice:

    a. Find cases whose first name ends withY [Caution:Click Ends with underMatch to; (Jenny, Nancy, Mickey, Mary, Kimberly etc)]

    2. Tool Go to caseUtility

    This is useful when you have a very large data base. Typical use is when you analyze

    outliers. For example you had plotted box-plot ofpercent(grades.sav) and found cases

    75, 25, 46 and 55 as mild outliers. Now you want to see the data pertaining to these cases

    in data file.

  • 8/12/2019 Ch 1 Cebt Notes

    3/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 3

    Query 2

    Lets assume that you want to see case 75 first.

    Start:Keep cursor in any of the cells under columnpercent.

    Click Go to case.

  • 8/12/2019 Ch 1 Cebt Notes

    4/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 4

    Feed 75. Click Go.

    Now you can see the data of case 75.

    Practice:

    a. Go to cases 25, 46 and 55.

  • 8/12/2019 Ch 1 Cebt Notes

    5/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 5

    3. Tool Split FileUtility

    Many a times you want to analyze data group-wise. For example in cs2LR.savyou want to

    analyze data separately pertaining to those who had drug reaction and of those who did not

    had drug reaction. One typical use is to find covariance matrixfor each category of drug

    reaction, 0 and 1 which is needed for checking assumptions of Discriminant Analysis.

    Query 3

    Splitcs2LR.sav file on the basis of DrugReaction.

    Start:Go to Split Fileicon (near Tarazu)

  • 8/12/2019 Ch 1 Cebt Notes

    6/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 6

    Click Split File.

    Click Compare groupsand dragDrugReactioninto Groups Based on: Click OK.

    You will not be able to see anything in data filebut in output following will appear.

    Now say you want to plot box-plot of Age. After splitting, you will get two box-plots as

    shown below:

  • 8/12/2019 Ch 1 Cebt Notes

    7/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 7

    You can de-split the file by again clicking on Split Fileand clicking Analyze all cases,

    do not create groups and OK.

  • 8/12/2019 Ch 1 Cebt Notes

    8/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 8

    In output following will appear.

    SPLIT FILE OFF.

    Practice

    a. Splitgrades.sav file on the basis of genderand plot box-plots for percent. De-splitthe file after plotting box-plots.

    4. Tool Select CasesUtility

    Many a times you would like to select cases on the basis of some criterion and either you

    take them separately in another file or keep them there in the same file but did not want

    them to participate in analysis. Also, at times you would like to select a sample on

    random basis for further analysis.

    In file grades.sav, ethnicitywas categorized in 5 categories as Native, Asian, Black,

    White and Hispanic coded as 1 through 5 respectively.

  • 8/12/2019 Ch 1 Cebt Notes

    9/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 9

    Query 4

    Let us assume that you do not wantNativeto participate in your analysis.

    Start:Go to Select Cases.

    Click Select Cases. Following dialogue box will appear.

    Click If condition is satisfiedand If

  • 8/12/2019 Ch 1 Cebt Notes

    10/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 10

    Following dialogue box will appear.

    If you do not want Natives (coded as 1) this means that you want to select all other than 1

    or Natives. Or in other words you want 2, 3, 4 and 5. This can be expressed as

    ethnicity>1. Feed the same in top box as shown below.

  • 8/12/2019 Ch 1 Cebt Notes

    11/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 11

    Click Continue.

    You will be back to previous dialogue box. Observe that ethnicity > 1has been appeared

    after If

    Click OK.

    See that case numbers 7, 15, 17 etc (belonging to code 1 for ethnicity) have been

    stricken-off as shown below.

  • 8/12/2019 Ch 1 Cebt Notes

    12/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 12

    You can remove selection by Select Cases Select All Cases OK.

  • 8/12/2019 Ch 1 Cebt Notes

    13/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 13

    Query 5

    Create a new file with Selected cases only. [Cases selected in Query 4]

    If you wish to create a new file containing selected cases only then click Copy Selected

    cases to a new data set and name the file as No_Nativesunder box Outputas shown

    below.

    Click OK.

  • 8/12/2019 Ch 1 Cebt Notes

    14/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 14

    You will have the following file.

    [You can use Delete unselected casesoption also]

    Query 6

    Utility

    After building regression or Decision Tree model, you would like to test the same on out-

    of-sample data which is known as validation. This means that you must create validation

    data set before building model/s. This command is useful for creating a data set for

    validation.

    Select approximately 20% casesfor validation purpose.

  • 8/12/2019 Ch 1 Cebt Notes

    15/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 15

    Start: Click Random sample of cases and click Sample. Write 20against

    Approximately. Click Continue.

    Click OK. File will look like as shown below.

  • 8/12/2019 Ch 1 Cebt Notes

    16/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 16

    Now you can create new file with selected cases.

    Practice

    a. Selectonly Asiansfrom grades.sav file.b. Create new filewith selected cases as done in a.c. Select 15% cases randomlyfrom grades.sav file and take selected cases in a new

    file.

  • 8/12/2019 Ch 1 Cebt Notes

    17/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 17

    5. Tool Conversion of Continuous Data into Categorical Data

    Utility

    Say you want to analyze the association between ethnicityand percent. Had ethnicity

    been a continuous variable, you could have used Pearson Coefficient of Correlation or

    had percent been a categorical variable; you could have used Chi-Square test. You cannot

    convert ethnicityinto continuous variable but you can convert percentinto few

    reasonable categories.

    Query 7

    Convert percentinto few reasonable categories in file grades.sav

    First find the minimum, maximum and range of percent from Descriptivecommands.

    [Analyze Descriptive Statistics Descriptive Drag percentunder Variables

    Click Options Click Minimum, Maximumand Range Continue OK]

    Now it will be reasonable to make categories from 40-50, 51-60, 61-70 and so on till

    91_99 or 100.

    Start:Transform

  • 8/12/2019 Ch 1 Cebt Notes

    18/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 18

    Click Recode into Different Variables

    Drag percentunder Numeric Variable Output Variable:write percentCATGRY

    against Name:, write percent in categoryagainst Label:, Click Change.

  • 8/12/2019 Ch 1 Cebt Notes

    19/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 19

    Click Old and New ValuesFollowing dialogue box will appear. Click Range. Write

    40 through 50under Old Valuesbox. Write 1(as code 1 for class 40 to 50) against

    Valueunder New Valuebox.

    Click Add. See 40 thru 50 1have been appeared under Old New:box.

  • 8/12/2019 Ch 1 Cebt Notes

    20/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 20

    Click Continueand Repeat the same for 51 through 60, 61 through 70, 71 through 80, 81

    through 90 and 91 through 99 under Range, 2, 3, 4, 5 and 6 against Valueunder New

    Valuebox, click Addafter every entry. The dialogue box should look as shown below:

    Click Continue. You will be back to original dialogue box. Click OK. See the data file

    with new variable which is categorical now.

  • 8/12/2019 Ch 1 Cebt Notes

    21/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 21

    Practice:

    a. Categorize totalunder reasonable categories.6. Tool Replace

    Utility

    This is useful when you want to replace an existing number/name from data set which

    has been appeared at several cells. Manual attempt will be time consuming and error

    prone too. A typical utility comes when you apply Chi-square test.

    Let us apply Chi-square test for testing the hypothesis H0: There is no significant

    association between ethnicityand percentCATGRY.

    Use following commands:

    Analyze Descriptive Statistics Crosstabs drag ethnicityunder Row(s):and

    percent i n categoryunder Column(s): Click Statistics Click Chi-square

    Continue Click Cells Click Observed, Expectedunder Counts Continue

    OK. Look at the Output.

    Now there is a serious problem in the above output. 73.3% cells have expected countsless than 5. The Upper prescribed limit is 25% [page 113, SPSS Step by Step, Darren

    George and Paul Mallery, 8th

    edition].

    This problem can be solved by clubbing the categories either in ethnicity or in

    percentCATGRY or in both. Choice is left to you for clubbing of the categories. Before

    clubbing it is advised to see the contingency table (shown below).

  • 8/12/2019 Ch 1 Cebt Notes

    22/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 22

    Author has recommended clubbing of category 1 and 2 in one category in variable

    percentCATGRYand Native& Asianas one category and Black, White& Hispanicas

    one category in variable ethnicity.

    Query 8

    Club categories 1 and 2 in one category in variable percentCATGRYand Native&

    Asianas one category and Black, White& Hispanicas another category in variableethnicity.

    Let us take the variable ethnicityfirst.

    Use Findcommand as explained before and use Replace Replace with.

    Find 2 and Replace with 1. [20 replacements will be executed]

    Find 3 and Replace with 2. [24 replacements will be executed]

    Find 4 and Replace with 2. [45 replacements will be executed]

    Find 5 and Replace with 2. [11 replacements will be executed]

    Now let us take variable percentCATGRY.

    In variable percentCATGRYfollowing will be the sequence.

  • 8/12/2019 Ch 1 Cebt Notes

    23/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 23

    Find 2 and Replace with 1. [5 replacements will be executed]

    Find 3 and Replace with 2. [12 replacements will be executed]

    Find 4 and Replace with 3. [32 replacements will be executed]

    Find 5 and Replace with 4. [35 replacements will be executed]

    Find 6 and Replace with 5. [19 replacements will be executed]

    After regrouping the categories for ethnicityand percentCATGRYre-run the Chi-square

    test. The output will be as follows.

    Still the norm of 25% is not adhered to, however, under special circumstances, 30% can

    also be accepted [Consult your mentor/guide on this issue]. Further, the improvement

    from 73.3% to 30% is appreciable. Moreover, this example is cited for the typical use of

    Replacecommand. Participants are encouraged to experiment more so that prescribed

    limit of 25% can be adhered to.

    Practice:

    a. You have categorized totalunder previous section (Tool 5). Apply Chi-square testwith categorized totaland ethnicityand use Replacecommand (if needed).

  • 8/12/2019 Ch 1 Cebt Notes

    24/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 24

    7. Tool Compute New VariableUtility

    This is a very useful commands used very frequently in data analysis. Some very

    common utility is for transforming an existing variable for the purpose of achieving

    normality. For calculating Mahalanobis Distance (discussed in Chapter 2), computation

    of new variable will be used.

    Query 7

    Find Square-root of Agein file cs2LR.sav.

    Commands

    Transform Compute Variable Click

  • 8/12/2019 Ch 1 Cebt Notes

    25/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 25

    Write SqrtAgeas the name of new variable under Target Variable:Select Arithmetic

    from Function group:and select Sqrtfrom Functionsand Special Variables: Click

    upside arrow. See SQRT (?)under Numeric Expression:

    (?) is asking for existing variable. Click Ageand click arrow. After doing this much the

    OKbutton which was dim before will become dark. Click OK.

  • 8/12/2019 Ch 1 Cebt Notes

    26/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 26

    See new variable SqrtAgein data file.

    8. Tool RANGEUtility

    Many a times you may want to know cases falling within a particular RANGE. For

    example you have collected data of sales of 200 companies and named it SALES. This is

    continuous variable. SALES ranges from 25 million INR to 250 million INR. You want

    to know the cases/companies having SALES between 25 to 175 million INR. If you

    attempt manually though it is possible as the total numbers of cases as only 200, but it is

    time consuming, at times irritating and error prone. This command will ease your work.

    Note:

    SELECT command can also select and take away selected cases in separate file or delete

    the selected cases or allow only selected cases to be processed. But what if you want to

    code cases belonging to 25 million to 175 million in one category (1) and rest in another

    category (0) then SELECT command will not be helpful. This is specifically needed if

    you want to process these two categories for subsequent analysis.

  • 8/12/2019 Ch 1 Cebt Notes

    27/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 27

    Query 8

    Code 1 for cases belonging to Age between 25 and 35 in cs2LR.sav file. Rest should be

    0.

    Commands

    Transform Compute Variable

    Write AGE2535 under Target Variableand RANGE(Age,25,35)under Numeric

    Expression:

    Click OK.

  • 8/12/2019 Ch 1 Cebt Notes

    28/29

    CEBT Tutorials-Module I

    VKM on Basic SPSS Commands Page 28

    Practice

    1. Code BPfalling in range 120 to 180as 1 and code rest as 0.9. Tool ANY

    Utility

    The way RANGEhas its utility in coding a continuous variable in 1 and 0 similarly;

    ANYcommand can code a categorical variable in 1 and 0.

    In file grades.sav, ethnicitywas categorized in 5 categories as Native, Asian, Black,

    White and Hispanic coded as 1 through 5 respectively.

    Now say you want to code Native (1) and Asian (2)as 1 and restas 0 then use ANY

    command as shown below.

    Note:

    The above task can be accomplished through Findand Replacecommand also but it will

    be comparatively time consuming.

  • 8/12/2019 Ch 1 Cebt Notes

    29/29

    CEBT Tutorials-Module I

    Commands

    Transform Compute Variable

    Write ethnicity_1_2 under Target Variableand ANY(ethnicity,1,2)under Numeric

    Expression:

    Click OK. The data file will look like as shown below.