mhhilllllllle iiiiiiii mhhhhmmhhmlr pi report 801-2 december in$ tailored testing aor 0esa0c -tor...

91
A097 081 MISSOURI UNIV-COLUMBIA TAILORED TESTING RESEARCH LAB FIG 9/2 COMPARISON OF THE ANCILLES AND LOGIST PARAMETFR ESTIMATION PR--ETC(U) DEC 8 0 R L MCKINLEY, M D RECKASE N00014-77-C-0097 'CLASSIFIED RR-80-2 NL mhhhhmmhhml mhhIllllllllE IIIIIIII

Upload: others

Post on 04-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • A097 081 MISSOURI UNIV-COLUMBIA TAILORED TESTING RESEARCH LAB FIG 9/2COMPARISON OF THE ANCILLES AND LOGIST PARAMETFR ESTIMATION PR--ETC(U)DEC 8 0 R L MCKINLEY, M D RECKASE N00014-77-C-0097

    'CLASSIFIED RR-80-2 NL

    mhhhhmmhhml

    mhhIllllllllEIIIIIIII

  • 111- .8 111 .

    III111112-2

    t11I'125 1.4 111.

    MICROCOPY RESOLUTION TES I CHARFlNn INAl I ' AI I ,AN IAPI 1

  • LEELz- A,

    o Parameterfor the Tbree-Paraneter

    o Logistc Model Udng Goodmumof FItas a Criterion

    Robt L. cineDTICand ~ ELECTE

    Mark D. RecIkse MARt 31 I

    R PI Report 801-2December In$

    Tailored Testing 0esa0c aor -torEducational Psychology Depatada

    University of MissouriColumbia, MO 65211

    Prepared undar cowhad No. t48114774CMS7,Ntqwa th e Pwinud Mmd Trabi 3" u~

    ftedmaulhWO11116 of MOM&3w

    Approved for pubic rdSO. 1uhq uWhWReremuelmIn whbie w U 35t UV 0 af

    po(Oih.UMlISN M GWNO

  • SECURITY CLASSIFICATION OF THIS PAGE ("onm Data Entered) __________________

    REPOT DCUMNTATON AGEREAD INSTRUCTIONSREPOT DCUMNTATON AGEBEFORE COMPLETING FORM

    1REPORT NUMS 2.G CESON NO: 2. RECIPIENTSl CATALOG NUMBER

    80. TIL (ad . TYPE OF REPORT & PERIOD COVERED

    A Comparison of the ANCILLES and LOGIST Para- Tcnclrprmeter Estimation Procedures for the Three- PEORIGR.RPRTNMRParam;Te~yastic Model Using Goodness of Fit

    Robert L.[McKinley andMark D.jIReckase

    rl .. NOEC'ASSIPICTONW AD

    9. DISTRIUTIORNI ATEN NAME CAD ADRS hot.111401110 5-1 N9L1A9)

    whermn or inupatspitted fscorlay purpos ofj. theUntedStteGoivermet.fMsor TA:4-40

    Co um.a DSR UTO STM ENTur 6520..b1,. uW.V.:N lck2.IIdft~fUiee lp R 15039

    11 SUPPRLMENTAR OESNM N DRS EOTDT

    Is. ne aE OD C~tnd. Traiemn. ig R eearh Proram ANWllp y lcknm

    Latent ~~~ ~D Tri odl oons o iAffcLeS Paamte Estimation "UI"OWP&U

    ON ST gai 227z

    timrate oainedcfromethe; iLstrbto anuLGnTlsimtion Rprocdureso uinggodes of iat as apcriteo Satisiy pused ocmr the itielde

    staisic DSRBTOth ATN(fsalssac perred Inclu0.itdifedt comprsnt fteditiui

    ItI SUPA N TAR NOT3EIINOS O ~lOSLT

    Ill. ~ ~ ~ S/ KEYWORDS 6601u nr~oosd Ie esr dienfybblcnu e)LECUenT CTrSaICAtO OPel THIes Pof fitm

    ANCILES aramter stimtio4OIS

  • w..-u41TV CLASSIFICATI)ON OP THIS PAGlElM Daft. he.,

    of the parameter estimates obtained from the procedures, and a set of meta-analyses performed on the chi-square statistics. obtained for the items. Thedata for the study was composed of 50 items and 2,000 cases obtained usinga stratified random sample of 357 items and 4,000 cases of the Iowa Tests ofEducational Development. Results indicated that differences did exist in

    the quality of parameter estimates obtained from the ANCILLES and LOGIST

    procedures. More items for ANCILLES showed significant lack of fit than forLOGIST, based on the chi-square tests. However, a test to determine whether

    the ANCILLES chi-squares were larger than the LOGIST chi-squares significantlymore than half the time was not significant. Also, the dependent t-computed

    for the MSD statistics was not significant. Further analyses indicated thatthe parameter estimates obtained from the two procedures were highly cor-related. Comparisons of the item parameter estimates with the cni-squarevalues indicated that ANCILLES yielded poor fit for items with extreme diffi-

    culty values. Because of this, and the fact that AZCILESdid not provideestimates foe all the Iteips, it was cQacluded that thert wane real differencesin the quelity.of the parameter estimates obtained for the two procedures.

    *~SCUT se ,CLAMgPICATOM OF ?"011 PhOUEIPO 000 9;-m

  • CONTENTS

    Introduction....................................................... 1

    The Model and Procedures............................................ 2

    Goodness of Fit Statistics.......................................... 3

    Method............................................................ 6Test Data.................................................... 6

    Analyses..................................................... 6

    Results ........................................................... 7Chi-Square Analyses........................................... 7

    MSD Statistics............................................... 58

    Parameter Estimate Distribution Analyses ....................... 62

    Item Parameter Estimates ................................ 62

    Ability Estimates....................................... 65

    Discussion........................................................ 70Chi-Square Analyses .......................................... 71

    MSO Statistics............................................... 72

    Item Parameter Estimates...................................... 72

    Ability Estimates ............................................ 74

    Summlary and Conclusions ............................................ 74

    References ............................... &........................ 75

    AeOOssion For

    PrI-S GRA&1D11C TAB

    Distribution/

    Avi1 and/at

  • A Comparison of the ANCILLES and LOGISTParameter Estimation Procedures for the Three-Parameter

    Logistic Model Using Goodness of Fit as a Criterion

    Due to the growing use of latent trait models and the wide range ofapplications of these models (see the Journal of Educational Measurement,Summer, 1977), it has become important to investigate the properties ofthe numerous procedures that are available for estimating the parameters ofthe models. There are a number of different models in current use (e.g.,one-, two-, and three-parameter logistic; graded response; nominal response),and for many of these models item parameters can be estimated in several ways.While there has been some research done to investigate the differences be-tween the models (Reckase, 1977; Yen, in press; Divgi, 1980; Urry, 1970, 1977a),little has been done to compare estimation procedures for given model.

    One comonly used latent trait model is the three-parameter logistic(3PL) model. There are at least three estimation procedures available forthe 3PL model, each based on a different computer program. For example, theANCILLES (Urry, 1978), OGIVIA (Urry, 1977b), and LOGIST (Wood, Wingersky,and Lord, 1976) programs are all designed to estimate parameters for the 3PLmodel. Very little has been done to study the differences in these threeprocedures. Although they are based on the same model, the methods that theseprograms employ to estimate the parameters for the model are quite different(the differences between ANCILLES and OGIVIA are not as great as the differ-ences between LOGIST and the others). The few studies that have dealt withthe differences in these procedures have primarily been concerned with theability of the procedures to faithfully reproduce true item and ability para-meters. For instance, in a simulation study conducted by Ree (1979), threegroups of 2,000 subjects were simulated, and the simulated responses werecalibrated using the ANCILLES, OGIVIA, and LOGIST procedures. The estimatedparameters were compared to the true parameters, the estimated true scores,and an information comparison was made. It was concluded that the selectionof an item calibration program should be dependent on the distribution ofability in the calibration sample, the intended use of the parameter esti-mates, and computer resources available. Specifically, the differences thatwere found included the finding that LOGIST performed best for rectangularability aistributions and OGIVIA performed best for normally distributed abil-ity groups. Also, LOGIST was more expensive to run, but the OGIVIA and ANCILLESprograms did not always give estimates for every item.

    The Ree study indicated that there were differences in the quality ofthe parameter estimates given by the procedures considered, and the conclusionsprovided guidelines for selecting procedures for the model. This type ofstudy is useful, and should be extended to include other models, but thereare other comparisons that should be made. One important comparison thatwas not made was a comparison of the procedures using the fit of the modelto the data as a criterion, an important factor when considering the qualityof parameter estimation using the procedures. The purpose of this study,then, is to extend the comparison of the 3PL parameter estimation procedures

  • -2-

    to include a comparison of the fit of the 3PL model to real data when usingthe different procedures. Before reporting the present study, however, adiscussion of the model and procedures, as well as the fit statistics used,will be given.

    The Model and Procedures

    The model that was employed in this study was the three-parameter log-istic model presented by Birnbaum (1968). The model requires three para-meters for each item and one ability parameter for each examinee. The modelis given by exp(Da i(e - bi))

    Pi(ej) = c i + (1 - ci) (1 (1)1 + exp(Dai(e. - bi))

    where 0. is the ability parameter for Examinee j, ai is the item discrim-

    ination parameter, bi is the item difficulty parameter, ci is the itemguessing parameter, Pi(ej) is the probability of a correct response to Item i,

    and D is a scaling constant equal to 1.7.

    There are three commonly used programs for the estimation of the para-meters of the 3PL model, ---ANCILLES, LOGIST, and OGIVIA---but becauseANCILLES is a newer version of OGIVIA, OGIVIA was not included in this study.The ANCILLES estimation procedure is a two-staged procedure. In the firststage raw scores, corrected to exclude scores on the item being calibrated,are used as a measure of manifest ability. Using the correct raw scoresthe program computes item characteristic curves (ICC's) for various sets ofguessing, discrimination, and difficulty values. The proportions of exam-inees falling within set intervals of the manifest ability who passed theitem are computed, and those values are compared to the generated ICC's.Chi-square fit statistics are computed for each ICC, and the set of valueswith the minimum chi-square is selected. This procedure is repeated for allthe items to be calibrated. Then a second stage is begun, in which unregressedBayesian modal estimates (UBME's) are used as manifest ability in place ofraw scores. This substitution is made because the UBME's more closely ap-proximate the latent ability distribution. Using the UBME's ancillary esti-mates of the item parameters are made.

    The LOGIST procedure, on the other hand, uses neither Bayesian modalability estimation nor minimum chi-square item parameter estimation procedures.Rather, LOGIST uses maximum likelihood estimation for estimating both abilityand item parameters. Initial values for the item parameter estimates areset, and ability estimates are computed for all the examinees using maximumlikelihood estimation. Then the ability estimates are held fixed, and newestimates are made for the a- and b-values, again using maximum likelihoodestimation. These two steps, called a stage, are repeated a number of timeswith the c-values held fixed.

  • -3-

    After the first few stages the c-values are allowed to vary, but changein the c-values is still restricted. The procedure cycles through as manystages as is necessary to converge. Convergence is reached when the dif-ference between the estimates for successive stages is less than errors ofcalculation.

    While this is not a complete discussion of how these two procedures op-erate, it is clear from this treatment of the ANCILLES and LOGIST proceduresthat they do differ in the way in which parameters are estimated. For amore detailed discussion of these procedures see: Wood, Wingersky, and Lord,1976; and Urry, 1978.

    Goodness of Fit Statistics

    Whenever a model is used to approximate real data it is important todetermine the accuracy of the approximation. The failure of a model toaccurately represent the data may result in inaccuracies in measurementsbased on that model. Goodness of fit of the model to empirical data, then,is clearly an important property to consider when selecting a model. Itis jist as important when considering which procedure to use for estimatingthe parameters of a model. Item parameter estimates for a model are notunique. Clearly, different procedures may result in different estimates forthe same data. If different sets of estimates fit the data equally well,then either procedure may be appropriate. However, if the two sets of es-timates do not fit the data equally well, the procedure yielding the bestfit is the more desirable procedure.

    In the past a number of statistical goodness of fit tests for gaugingthe fit of a model to data have been proposed. Generally, most of thesetests involve computing statistics that fall in a chi-square or an approxi-mate chi-square distribution. For instance, a fit statistic for the IPLmodel proposed by Wright and Panchapakesan (1969) involves dividing examin-ees into groups according to number-right scores, and for each score groupcomputing the observed and expected proportions of examinees passing theitem, with the expected proportion being computed from the model. Fromthese proportions a fit statistic is computed with the following formula:

    j Njij(0 - Ei) (2)X 2 = Z 2j=1 E ij(1 - Ei )

    N 00NjOwhere Oij is the observed proportion passing Item i in Score Group j, E

    is the proportion predicted by the model, and the summation is over allscore groups for which the number of examinees in the group is not zero.The summation over number-right score groups can be used since the number-right score is a sufficient statistic for estimating 0 for the 1PL model.That is, each score group contains examinees with the same #. This stat-istic is essentially the summation of squared z-scores. Wright and

  • -4-

    Panchapakesan (1969) state that this statistic has J-1 degrees of freedom,where N. / 0. A variation on this statistic used by Rentz and Bashaw (1975),

    involves computing the X2 above and then dividing it by the number of scoregroups for which the number of examinees in the group is not zero, obtain-ing as a result a 'mean square' fit statistic.

    A procedure not limited to the 1PL model was proposed by Yen (in press).This statistic differs from the Wright and Panchapakesan statistic in thatexaminees are not grouped by number-right scores. Rather, examinees areordered according to their ability estimates. The range of ability esti-mates is then divided into categories (Yen suggests 10), and the observedand expected proportions are computed for those categories. This fit stat-istic is given by

    10 Nj(Oij - E. )2 (3)×2 = EIj=1 E ii(1 E Eii)

    where 0ij and Eij are as defined previously. Since the categories for this

    statistic are not based on number-right scores this statistic is not limi-ted to the 1PL model. Yen (in press) suggest that this statistic has 10-mdegrees of freedom, where m is the number of item parameters estimated.

    A similar statistic, s, was suggested by Wright and Mead (1977). Thisstatistic is given by

    J Nj(0 ij - Ei)

    = Eij( ij) P

    where 0. and E.. are as defined above and a2p, is the variance within cate-

    gory j of the predicted proportions passing the item (Yen, in press).Wright and Mead suggest the addition of the o2pj term because examinees

    within a category do not have the same 0, and the addition of the term pro-vides a more accurate estimate of the variance of Oij than does the denom-

    inator in Equation 3. For this statistic examinees are grouped in the sameway as for the Yen statistic. However, Wright and Mead suggest six orfewer categories, rather than the 10 suggested by Yen. The constant 1/Jprovides a mean fit statistic for the J cateogires.

    One statistic for measuring goodness of fit of a model to data that isnot based on the chi-square distribution is the mean square deviation (MSD)statistic proposed by Reckase (1977). The MSD statistic is given by

  • -5-

    N (u - Pij )

    MSO.i j=1 1j1 N

    where uij is the response to Item i by Examinee j, Pij is the probability

    of a correct response as given by the model, and N is the number of exam-inees. The purpose of this statistic is to avoid the differences causedby different interval sizes encountered with the X2 statistics describedabove. Reckase suggests that, even though the sampling distribution of thestatistic in unknown, hypotheses can still be tested because only compara-tive information is of interest. Thus, differences in MSD statistics obtainedfor different procedures for a single set of items can be tested usinganalysis of variance procedures (or, in the case of two procedures, a simpledependent t-test). Because the statistic does not group examinees, its useis not limTted to a single model.

    For the present study the MSD statistic and the chi-square statisticsuggested by Yen were selected. Since the present study is concerned withprocedures for estimating the parameters of the 3PL model, those fit stat-istics based on the number-right score groups are clearly inappropriate. Ina comparison of the Yen statistic and the statistic proposed by Wright andMead, Yen (in press) found virtually no difference in the two statistics.Yen concluded that using 10 categories was sufficient to produce small enoughvalues of 02pj would be sufficiently small so as to make it unnecessary to

    adjust the denominator in the chi-square statistic. Because of the concernover the differences the category sizes make in the chi-square statistic,the MSD statistic was included in the analyses.

    Analyses for the current study, then include the comparison of the chi-squares obtained for the two procedures using the statistic proposed by Yen,and a comparison of the MSD statistics obtained for the two procedures. Inaddition, direct comparisons of the obtained parameter estimates will bemade. These comparisons will include descriptive statistics and correlationsof the distributions of ability and item parameter estimates obtained fromthe ANCILLES and LOGIST programs, as well as plots of the ciserved propor-tions of examinees passing an item with the proportions predicted by themodel using the estimates from the procedures.

  • -6-

    Method

    Test Data

    The data-set used for this study was constructed from a 4,000 case sampleof the Iowa Test of Educational Development (ITED). The test items were astratified random sample of 50 items from the various subtests of the ITED.Response data for 1,999 examinees were sampled from among four grade levels.

    Analyses

    To begin the analyses, ability and item parameter estimates for the 3PLmodel were obtained by running both the ANCILLES and LOGIST calibrationprograms on the response data. For each set of estimates obtained chi-squareswere computed for each of the items using the following procedure. First therange of ability estimates was divided into 49 categories of .1 width (theend categories were larger so as to keep all cell frequencies > 5). Exam-inees were grouped, then, according to which category their ability estimateswere in. For each category both the proportion of examinees in that categorypassing the item and the proportion failing the item were obtained. Also,for each category the expected proportion passing and the expected proportionfailing the item were computed. The expected proportion passing an item, aspredicted by the 3PL model, is

    E+(1 - c) exp(1.7ai(e j - bi))

    1 + exp(1.7ai(e j - bi)) (6)

    where Eij is the proportion of examinees in Category j expected to pass Item i,

    e. is the midpoint of Category j, and the other parameters are as defined for3Equation 1. It should be noted at this point that, due to the small categorysize, the variance of the expected proportions was quite small. For the pur-poses of this study, then, the expected proportions were assumed to be constantwithin a category. That is, the variance of the expected proportions is equalto zero.

    Once the observed and expected proportions were obtained for both setsof parameter estimates, then chi-square statistics for each item, using bothsets of estimates, were computed using Equation 3 (with the modification that48 categories were used instead of 10). Using these chi-squares a number ofanalyses were performed. First, the chi-square values were compared to thecritical value to determine whether they were significant. Then a comparisonwas made to determine which procedure resulted in lack of fit for more items.Then the chi-squares for each procedure were summed and the resulting chi-squares were tested for significant lack of fit for the test as a whole. Fur-ther analysis included performing a binomial test to determine whether thechi-squares obtained for one procedure were larger than the chi-squares obtainedfor the other procedure more times than would be expected by chance. Two finalanalyses using the chi-squdres involved the graphic presentation of the obtained

  • -7-

    values. One analysis involved plotting, for each category, the observed andexpected proportions passing the item. This was essentially a visual comp-arison of empirical and theoretical ICC's for each item. Plots were made forboth procedures. The last analysis performed with the chi-squares for eachprocedure was the plotting of the obtained distribution of chi-squares withthe actual distribution of chi-squares computed from the chi-square probabilitydensity function.

    A set of analyses was also performed using the MSD statistic set out inEquation 5. For each set of estimates MSD statistics were computed for eachitem. The resulting statistics were tested for significant differences usinga dependent t-test.

    A final set of analyses involved the direct comparison of the parameterestimates obtained from the ANCILLES and LOGIST procedures. The analysis in-cluded a comparison of the shape of the distributions of the ability and itemparameter estimates, as well as correlations of the two sets of estimates.

    Results

    Chi-Square Analyses

    The item chi-square statistics obtained for the ANCILLES and LOGISTprocedures are presented in Table 1. Item 1 and Item 9 were deleted byANCILLES during calibration. Comparison of these values to the critical valuerequired for significance at ot = .05 revealed that significant lack of fitoccurred for fifteen items for the ANCILLES procedure, and for six items forthe LOGIST procedure. Although it is true that such a multiple comparisonincreases the probability of finding significant results, the intent is tocompare the two procedures rather than to make an evaluation of the proced-ures across items. Therefore the alpha level was not adjusted to accommodatethe multiple comparison. A test for the significance of the difference betweentwo correlated proportions (Ferguson, 1976) yielded a z = 2.68, indicatingthat a significantly higher proportion of items showed-lack of fit for theANCILLES procedure than for the LOGIST procedure (p < .05).

    Considering the results reported above it is somewhat surprising thatthe ANCILLES chi-square values are not larger than the LOGIST chi-squarevalues for significantly more than half the items. The ANCILLES chi-squarevalue is larger than the LOGIST chi-square value for only 25 items, and theANCILLES mean chi-square was not significantly larger than the mean chi-square value for LOGIST (58.12 for ANCILLES and 52.44 for LOGIST). It wouldappear, then, that the ANCILLES chi-square values were not larger than theLOGIST chi-square values more often than would be expected by chance, butwhen they were larger than the LOGIST values, they tended to be significant.

  • Table 1

    ANCILLES vs. LOGIST Goodness of Fit ComparisonUsing Yen's Chi-Square Statistic

    Item ANCILLES LOGIST

    1 46.652 70.36* 50.883 58.95 48.494 39.95 43.735 116.45* 46.576 133.14* 50.727 34.00 41.978 41.83 61.649 46.04

    10 56.56 32.90I1 51.13 48.3712 46.97 38.5813 81 97* 53.9014 35.38 59.6215 51.01 60.6416 61 .68* 52.0217 75.22* 02.06*18 62.22* 44.9019 50.15 35.1320 36.33 53.9221 50.91 58.8122 57.51 69.78*23 104.66* 80.91*24 45.96 46.9325 48.84 48.2626 52.44 55.9127 93.87* 93.96*28 57.10 56.1429 76.61* 51.0930 43.76 52.4331 50.85 49.8232 33.92 45.2033 65.78* 58.1834 52.86 70.37*35 58.27 60.9036 41.66 44.5537 55.20 47.5038 50.97 51.5439 34.49 44.9840 50.54 57.0541 46.54 47.2842 72.42* 71.08*43 46.98 43.2944 70.49* 39.8145 62.56* 49.2146 67.57* 50.8947 28.85 38.2248 45.55 46.2049 55.85 56.3550 53.39 44.24

    Note. The critical value for rejection of adequate fit is X2(45)-, 61.66at a = .05.

    significant it .05 lhveL.

  • -9-

    Another analysis that was performed on the chi-square values obtainedfor the ANCILLES and LOGIST procedures was the summation of the chi-squaresover items to test whether there was significant lack of fit for the test asa while. Using the normal approximation to the chi-square distribution yieldsa standard deviation of 66. The ANCILLES chi-squares summed to 2789, whichyielded a z = 9.54. The LOGIST chi-squares summed to 2517, which resultedin z = 5.41. Comparing these z-score values to the standard normal distri-button, clearly both summed chi-squares were significant, indicating thatthere was significant lack of fit for the test as a whole for both procedures.

    The final analyses performed on the obtained chi-square values involvedcomparing the chi-square values to a graphic display of the empirical andtheoretical plots of the item characteristic curves. Figure 1 through Figure48 show the obtained and predicted proportions correct for each item plottedagainst the ability estimates. Plots were made for both the ANCILLES andLOGIST parameter estimates. Exa'niining these figures closely does reveal oneconsistent pattern across items. The poorest fit for both procedures occursat the lower end of the ability scale. This is not surprising since it wasalready known that the lower asymptote of the ICC is difficult to estimate.It should be rioted, however, that the values at the lower end of the abilityscale are somewhat distorted due to the collapsing of categories that was re-quired for the chi-square procedure. In order to keep category frequenciesabove five, the collapsing of end categories was necessary, which resultedin some category frequencies that were relatively large due to the width ofthe category.

    Using a visual comparison of the plots for the two procedures, it isdifficult to determine whether the fit of one procedure was any better thanthe fit for the other procedure. It is also difficult to predict from theplots for which items lack of fit was significant. For example, the ANCILLESchi-square value for Item 6 was 133.14, while the LOGIST chi-square valuefor Item 6 was 50.72. The plots for Item 6, shown in Figure 5, do not atfirst indicate the large difference in fit. However, closer investigationdoes yield some insight as to cause of the difference in fit for that item.The intervals for the ANCILLES procedures showing the largest discrepancybetween the observed proportion correct and the expected proportion correctare those intervals containing the greatest number of examinees. For in-stance, the intervals between 0 = 1.0 and e = 2.0 show a fair amount of dis-crepancy between the observed and expected proportions correct. In thoseintervals frequencies vary from 60 to 90 examinees. (see Figure 51). Forthe LOGIST procedure the poorest fit appears to occur near 0 = 2.0 and 0 = -2.0.Frequencies in those intervals range from 10 to 20 examinees, which is farlower than the frequencies in the intervals where the ANCILLES procedureshowed poor fit. This was not a consistent pattern across items, however.

    Figure 15 shows the plots for Item 17. Both procedures showed lackof fit for Item 17, and it appears from the plots that the poorest fit wasin the same ability ranges for both procedures. For Item 23, shown inFigure 21, the ANCILLES procedure shows lack of fit in approximately the sameabieity ranges as in other items discussed, but the LOGIST procedure appearsto fit poorly across the entire ability range. The plots, then, do not ap-pear to indicate any other consistent pattern for the proceduresi

  • -10-

    FIGURE 1PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    RNCILLES AND LOGIST PROGRAMS

    ITEM 2

    SRNCILLES1KEMi fL -

    x

    a

    x

    x

    X x10 x

    a-tn xxN x

    x

    o X

    -4.00 -2.00 0.00 2.00 4.00 6.00

    INTERVAL MIDPOINTS

    Qo X LOGIST

    1,IfcMcF. -

    xx xx

    x-o

    a c; K X% X

    o: x xx

    IN(LR MIDP INT

    Ni

    o xx

    x x x

    2.004.00 6.06.00.0o.00 2.00 400 .0INTERVAL MIDPOINTS

  • -11-

    FIGURE 2

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BRSED ON

    RNCILLES AND LOGIST PROGRAMS

    ITEM 30- NCAL I RNCILLES

    -.ao

    .-crco03 x

    C--, oo -,o I o o o z o q.occ 0K

    -6. 00 -'1.00 -2.00 o'.00 2.00 '.00 6.00INTERVAL MIDPOINTS

    a

    - EIVRFIJC . X LOGIST7IlEINIIJClM. -

    x

    K,

    I-)

    fr.

    x K

    0 X Xx

    o JX-: Xx

    0- 0 -4., 00 -2.o 0 o'.0O0 2. 0 0 8' . oINTERVAL MIDPOINTS

  • -12-

    F IGURE 3PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 4

    C) * ARNCILLESxX

    1,EWETU..-xX

    xx

    cflc x

    -a

    a-u, Kmo xCc

    Xx

    0

    0'6.00 -I.o -2.00 0.00 2.00 4.00 6.00INTERVAL MIDPOINTS

    U3,

    I-o

    a:v x X XX

    xx ixo KX

    x

    000 -4.00 -2.00 0.00 2.00 4,.00 e. 00INTERVAL MIDPOINTS

  • -13-

    F I GURE 4

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 5

    00 . I1* m.A ANCILLES x

    l1,gwaclfmL. - -x

    xxK/f Kx

    r.- x"" /dx X x

    0 Kx

    I-. K

    ._Jo

    -

    o Kx

    K

    C0

    6.O0 -4. 00 -2.o0 0.00 2.0 0 ..00 6.00INTERVAL MIDPOINTS

    00• .. ,. - I LOGIST

    . 0

    x f x xx xx

    Lm x

    0c.

    I-

    0

    -- 0o -4. no -2.00 0.00 2.00 11.00 6.00INTERVAL MIDPOINTS

  • -14-

    FIGURE 5

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 60

    ; c ,'ucl . - ANCILLES

    ..

    L ; . x =

    cr,

    xJ X

    Sn

    Kx

    - .0 0 -4.O 01 -2.00 0 00 2.00 11 ,00 6.00

    /r /

    INTERVAL MIDPOINTS

    0

    = 'lla= - ,LOG I1ST

    (It

    lqiSE lCft - -

    Cx

    mc;

    -6.00 -4 .00 -2.00 0.00 2.00 4.00 6.00INTERVAL MIDPOINTS

    LOGISI

  • -15-

    FIGURE B

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 7a

    0• . w.rcm.- *RNCILLES xtlIEIUllI~U, * - =/

    Kx

    C- ;

    -

    a,.fVa:t "K

    NXC

    K

    000 -4.00 -2.00 o. 0 2.00 .o0 6.00

    INTERVAL MIDPOINTS

    g WTCO LOGIST xK

    K0

    I.-

    U)

    x-0

    Ic

    °:

    0 a K

    KK K

    -. oo -. 00 -2.00 0.00 2.00 14oo 600INTERVAL MIDPOINTS

  • -16-

    FIGURE 7PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGAMS

    ITEM 8

    Q[*IUC ANCILLES

    x xx

    o3 xx

    xI- x

    ccx

    o xx X

    xx x

    R-6. 00 -4.00 -2.00 0. 00 2. 00 4.00 6. 00INTERVAL MIDPOINTS

    E*lc cjcg.x LOJT x

    %n xo3 x

    C3Cc-, x

    crx xE)x IC

    cin x

    NX~ X K"0w

    C3 x

    -00 -4.00 -200 0'.00 2. 00 4'.00 6'.00INTERVAL MIDPOINIS

  • -17-

    FIGURE 8

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    RNCILLES AND LOGIST PROGRAMS

    ITEM 10

    *o EIJI,.-I = RNCILLESIgwIJQI.. - K

    x

    n

    C3 XXx,:J- /xx

    mx

    0

    IN'VI IP INI

    0K

    E:)

    . ( Lx

    K

    0600 o -4.00 -2.00 0.00 2.00 14.00 6.00INTERVAL MIDPOINTS

    "- X

    0

    -ance.- o'.S Ke1I-s (Too .- , .o-oo 4' o eo

    N UNcr

    (L nx

    0x

    x

    -10 4O -2.00 0.00 2. 00 4100 6.00INTERVAL MIDP13INTS

  • -18-

    FIGURE 9

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGISI PROGRAMS

    ITEM 11• .rnck "L RNCILLES

    INEWTICA -

    X

    In X

    XX

    xx

    I- x

    x x

    x. x

    crm x

    X X

    XXX

    ax

    O6.00 -'4.00 -2.00 0.00 2.00 4.00 6.00

    INTERVAL MIDPOINTS

    SEMPIRCL LOGIST• IRJC. - -

    N

    N xx

    0 Kx x

    x xmX

    x XK w

    IL-n xxx

    cl... .. .. ..

    N~ X

    *~ X

    -4.0 -*2.00 0.00 2.00 4.00 6'.00INTERVAL MIDPOINTS

  • -19-

    FIGURE 10PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 12

    . ,NCILLES x

    Inx xI",-

    x

    -X X..JOU'-

    a: oxmx

    X

    o

    060 -4.00 -2.00 0.00 2.00 4.00 6.00INTERVAL MIDPOINTS

    .. rltc,, . X LOG] ST

    _Jo-'"

    xox

    co

    x X0,

    6.00-O- -14.00 T2.00 0. 00 2.00 m'.00 6.00I NTERVAL MIDPOINTS• -N

  • -20-

    FIGUI3E 11PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASEO ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 13

    9 ~~ cnznsR NCI LLESXX,

    x X x

    I-a

    Xc x

    nLU) X xNl

    0K/

    o x

    t-6.0 -4. 00 -2 .00 0'.00 2'.00 4 1.00 6.00INTERVAL MIDPOINTS

    * C~rIRcIIL~xLOGIST

    X X

    -o X

    a:X X

    XX

    N xxCo x X

    XXx

    0..00 -4.00 -2.00 0.00 2,00 4. 00 6. o

    INTERVAL MIDPOINTS

  • -21-

    FIGURE 12PLOTS OF EMPIRICAL RND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 14

    9 cWI~RICK - R NCILLES x

    x

    xx x

    -v xx

    cr ~ x

    mK x

    INDVLMIPIT

    xx

    xx- KXX~J XX X

    ,Lf 24 20 .0 .0 40

    N XK x

    x x

    o K x0600* -40 -20 .0 20 .0 60

    ccERA MIDPOINTS

  • -22-

    FIGURE 13

    PLOTS OF EMPIRICAL AND

    THEORETICRL CURVES BRSED ON

    RNCILLES AND LOGIST PROGBRMS

    ITEM 15

    ,*-RNCILLESHEON1ICOL. -

    r.

    I-

    CD xxx

    x

    xx xxx

    0

    a - t :

    v x x

    xK

    -4.OO.O0 -.00 o .0O0 2'. 0 . 0O0 6 .00

    I NTERVAL MIDPOINTS

    oK

    E• _ .., JCL - X L OG IST )w __II .O REIC M - - Y X

    U,,0

    Y, K)

    xx

    a: K :K

    E x x

    0 K!

    -6. O0 -4. 0O0 - . 0O0 . 0O0 2'. 0 40. 00 .00INTERVAL MIDPOINTS

  • -23-

    FIGUR3E 14PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AlND LOGIST PROGRAMS

    ITEM 16

    EMIICAL *X ANCILLES

    XXx

    X

    -Jo)"V!- x

    m x0r

    K

    xx

    01

    o - , w w N -Xf% mI I-C-6. 00 -4.00 -2.00 0.00 2.00 4.00 6.00

    INTERVAL MIDPOINTS

    lillncpL- tx LOGISTINCOICA. - -

    I- KW

    aio

    m xa:- X X x

    Q- nccxXXN

    o imx-

    06.00 -4.00 -2.00 0.00 2.00 4.00 5'.00INTERVAL MIDPOINIS

  • -24-

    FIGURE 15PLOTS OF EMPIRICAL RND

    TH-EORETICAiL CURVES BRSED ON

    ANCILLES RND LOGIST PROGRAMS

    ITEM 17

    * RNCILLES

    I- K

    F- K

    C)acLI

    NroL K

    c XK

    m

    x

    x "K

    0 K

    0x~~600 -4.0 2.0 000 .0 4.0 .0

    INEVLMIPIT0x

    * c~c,.-jLOGIST

    r: -

    C3KUx K

    0x K

    I-

    x x

    x

    0

    C 6.00 -4.00 --. 00 o'.00 2.00 41.00 '6.00INTERVAL MIDPOINTS

  • -25-

    FIGUFBE 16PLOTS OF EMiPIRICAL AND

    TH-EOETICAL CURVES BASED ON

    FiNCILLES FiND LOGIST PROGRAMS

    ITEM 18

    CAPIRIA X NC ILL ES

    I,4Esuij~. -

    -Jo XX x

    crWK

    0 KXXm x xK

    x

    .00 D -4.00O -2.00O 0.00 2.00 4.00 6.00INTERVAL MIDPOINTS

    * EVAIJCL - LOGIST

    in

    K

    X X X X.. j0: Xx

    cr xmK

    CC

    K x0x XKxxx Xx

    -D.O -4.00 -D.o 0.00 2.00 14.00 6.00INTERVAL MIDPOINTS

  • -26-

    FIGURE 17

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 19a

    • LUWIRJcAL - xARNCILLES w x~IIIcarJ - - Kx

    U)

    .o x

    x

    I--

    0 I0

    cc

    ID

    0o x

    "-'"6•~~~ O0 _.j .. -. *xx

    L 00 -4.00 -2.00 0.00 2.00 '4.00 6.00

    INTERVAL MIDPOINTS

    0

    • LOGIST x xT'IESMIJCI. - - x

    o

    1-

    J0K

    .fx

    cc x0

    )KX

    %60 -40 -200 0.00 200 '4.00 6'.00INTERVAL MIDPOINTS

  • -27-

    FIGURE 18PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOCIST PROGRAMS

    ITEM 20

    C* * ANCILLES

    1ig~t. *

    x

    xN

    xx

    CoC2 x

    C

    * Ec!~ . LCGIST _x

    Ih x

    I-o- In

    uc

    CLW

    N

    x

    -&Ol. 0 00.O "00 ~ 00 t' .00 6 .00INTERVRL IOPFOINTS

  • -28-

    FIGURE 19

    PLOTS OF EMPIRICRL RND

    TIHEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 21

    • . cgrc,.x RNCILLES

    x x

    x x

    r

    I- 2

    x

    0o 6. oo -K.0 -2.00 0.00 2.00 4.00 6.0

    INTERVAL MIDPOINTS

    IM•. PJAICAL - I LOGIST x

    -x x

    .J ox x

    xD X

    xKx

    0x

    nm XE) x

    X )

    -6. O0 -4.0O0 -2.0O0 0.0O0 2.0O0 4.0O0 6.O0I NTERVRL MIDPOINTS

    i . . . II ... . .

    0 K)CX X X K

    XX

    C!K- 0 -40 20K.0 .0 1.0 60

    INERA MDOIT

  • -29-

    FIGUR3E 20PLOTS OF EMPIRICAL AND

    TIIEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 22C3

    ANCILLES

    x x xxx

    NC xo XX

    o x-60C40 -. 0 00 .0 .0 60

    XXX

    XX X x~

    /X KXX X

    CNC

    Kx XX

    x

    x0

    ~~~5.00 .a. 0 -20 0.0 20 .0 60INERALMiPONT

  • -30-

    FIGURE 21

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 230

    C*e.. Alif - , RNCILLESt1i'oncij. - - wX

    x x

    r-

    x

    x I

    Q.

    x

    x

    x

    c

    VXXo X

    , x x '

    0-6 00 -4.00 -. 0o o'.00 2.00 .00 6.tJo

    INTERVAL MIDPOINTS

    0

    i

    SDWIRICAL. - X LOGIST

    -. ,x )Ix

    x x w x

    xX X

    x-- x

    .,in! X X

    coEq ) ",X

    cr-

    x

    L. 0 -. O0 - '2.0 o '.0 2. 0 '. o 8,00INTERVAL MIDPOINTS

  • -31-

    FIGUR3E 22

    PLOTS OF EMPIRICRL RNO

    TH-EORETICRL CURVES BRSEO ON

    PNCILLES RNO LOGIST PROGRRMS

    ITEM 24

    *fjtc ANCILLES

    ox X

    xx

    c- XX

    a: xU)x x

    X* X~ x

    x

    TK g--i-y- -I --- I06.00 -4.00 -2.00 0.00 2.0 D 4.00 6.00

    INTEHRL MIDPOINTS

    * (~?IiC~L *LOG IST

    0 X

    x 'jK XXKx

    x U)

    mx X xX

    C)K X X x

    c-60 -4.0 -/. 0.0 20 .0 60INERA xIPONT

  • -32-

    FI6GURE 23PLOTS OF EMPIRICAL AND

    THEOHiETICRL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 250

    O ICAL X RNCJLLES w x

    x

    In

    °. x0xmK

    Cc W

    -

    X

    C3

    o K

    0

    0-6.00 -4.00 -2.00 0.00 2.00 14.00 6.00

    INIERVRL MIDPOINTS

    * w rnk. -LOGIST M1,Ij('rcgL. --

    XX

    oI- x

    X X X

    XX

    ~.0•r xr "Y K

    (U X )4x

    xx

    06.00 -4.00 -2.00 0.00 2.00 4.00 6.00

    INTERVAL MIDPOINTS

  • -33-

    FIGURE 24

    PLOTS OF EMPIRICRL AND

    THEORETICRL CURVES BASED ON

    RNCILLES AND LOGIST PROGRRMS

    ITEM 26

    EMRIAL X RNCILLES ×

    LoK x

    xt x

    u)

    (t K

    o x

    xx.x

    C3 x- --- r - * m m ---------6.00 -4.00 -2.00 0.00 2.00 4.00 6.00

    INTEBVRL MIDPOINTS

    • .lA~lIlCK - XI LOG15T"

    rLujci- LGSlt*MWICL - -K

    U)X

    I- K-Jo-U)-

    a:m x x

    E) x

    Sx

    0~ x

    x K xx

    6K•~~~ r-- - --jt- ---- "-,-' -. - ' -__ ___ __ '_

    -6 .00 -4.0 -2.00 0.00 2.00 4.00 6.00

    INTERVRL MIDPOINTS

  • -34-

    FIGURE 25PLOTS OF EMPIRICAL RND

    lTHEORETICRL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 27

    * UtM~L-1ANCILLES

    XX

    -I x

    m x

    a:oou x

    0x X

    xK

    ~60 -.00 -2.00 0.00 2. 00 4.00 5.00INTERVAL MIDPOINTS

    *~fflc I LOGISTIIIEW~TICM -

    K Koa x

    I- XCE x

    mK

    0 xxK xX

    X x

    ,-6s.00 -4.00 -2.00 0.00 2. 00 4'.00 6.00oINTERIVAL MIDPOINTS

  • -35-

    FIGURE 26PLOTS OF EMPIRICAL RND

    THEORETICAL CURVES BASED ON

    ANCILLES FiND LOGIST PROGRAMS

    ITEM 28

    EPIRICAL X ANCILLES

    ~i~6f(T~c. .x

    Ln x

    a: I XXXn-tn X

    IxA 4

    0

    ElmloC% x LOIScl- x

    U, *x

    X

    n- r

    7-OE IA - -~

    0600 ~ ~ ~ ~ ~ ~ x -40 20 .0 .0 '.0 60INERR MIPIT

  • -36-

    FIGURE 27

    PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    RNCILLES AND LOGIST PROGRAMS

    ITEM 29

    * £WIMICAL X ANCILLES

    xCo x

    xx K

    - XX

    a: xED

    0?:

    n~. U)C\J K

    X XX

    060 -4.00 -'02"00 0.00 2. 00 4.00 6.00INTERVAL MIDPOINTS

    * ElIFIRUCAL - X LOGIST

    >- X

    XX xX

    X X x

    -Lfl KmC*1

    0r xE0x

    __, 4

    x 1~60 -'.00 -2.0 .00 2.0 '400 .0INERA

    MIPONT

  • -37-

    FIGURE 28PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 30

    IIOtfI M -

    -inI.- x

    m5

    Qtn

    0 ~xK

    C600 -4.00 -2.00 0,.00 2.00 4'.00 6'.00

    INTERVAL MIDPOINTS

    crDf~c LOGIST x

    112EW71CAt - - x

    toxI-.X

    .J x

    - U,_x

    a:co x

    ct:x x

    06.0 D -4.00 -2.00 0.00 2.00 4.00 6.00

    INlEfiVRL MIDPOJWTS

  • -38-

    F IGURE 29PLOTS OF EMPIRICAL RND

    THEORETICAL CURVES BASEDO N

    ANCILLES AND LOGIST PROGRAMS

    ITEM 31

    (hrifICA -XRNCILLES(NE00IJCA-

    xLo x )S

    F- X

    Lo

    C X X

    C) x

    C3

    -6.00 -'4. 00 -2.00 0.00 2.00 L.00 6.00INTER3VAL MIOPOINTS

    C)o IAC~AL - XLOG15T

    X

    ?X X Xo3 x x

    c- x x

    ci: x

    XXa:x x

    a-U) x ~xXX

    6.0 4o 10 0 .0 40IN E/KMIP I T

  • -39-

    FIGUR3E 30PLOTS OF EMPIRICAL RN!J

    THEORETICAL CURVES BASED ON

    RNCJLLES RNrJ LOG15T PROGRAMS

    ITEM 32

    9 CEf5RICAL X RNCILLES xlIMEOREIICRL-

    ,x

    FD X

    -Lncv x

    'V x

    xxx

    060 -. 0 -2.00 0.0 2.00 4.00 6.00

    INTERVA/L MIEJPLINTS

    0 LOGIST

    UkDAfETICAL -

    x

    xx

    ar xcr,) x

    cc

    0 x

    X

    xx

    0X x

    0-00 -4.00 -2.00 0.00 2.00 4.00 6.00INTERVAL MIDPOINTS

  • -40-

    FIGURE 31PLOTS OF EM'PIRICRL RND

    TIEORETICRL CUR~VES BRSEO ON

    RNC1LLES AND LOGIST PROGRRMS

    ITEIM 33

    LIW~l~L* NCILLES1,.EO lJCRL

    xx

    x

    X

    x

    cri

    c J

    o x

    0

    - OGIS

    ax.x x

    c' x Xx

    -j.010 -4.00 -. 0 0.00 2'.00 4. 00 6.00INITERVAL ?1IfPOINIS

  • -41-

    FIGURE 32PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BRSEDO N

    RNCJLLES RNDJ LOGIST PROGiRAMS

    ITEM 3'1

    0C.

    EMMURCAL k RNCILLE5 x

    X

    .- U!m - x X

    cr x

    CC:Q-t x

    0x

    Co6- ---- ~ H- -X ~---6.00 -14.00 -2.00 0.00 2.00 4.00 6.00INIEF3VRL MIDPOJINTS$

    CDEAPISqICAL. - LOG15T xIAEORCIICAL -

    XX

    Ox Xx X

    xX X

    m5 x

    cn X x

    cc xX X

    eX XX

    x X

    -6.01)0 -4.00 -2.0 0.0L) 2.00 4'.00 6.00INTERVAL MID3POINTS

  • -42-

    F IGURE 33PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES 6PSED ON

    ANCILLES RNO LOGIST PROGRMS

    ITEM 3S

    EMPIRCAL XRNCILLESIMESMETL .-

    In

    xx x(xX

    x

    C3

    Cr

    a- to

    XX x xX

    cc

    XXX

    O0.0 -. 0 -2. on 0.00 2. 00 4.00 6.00INTEHiVRL 1110PfIN'TS

    CPIICAL- kLOG 1ST1,4EOAEIICPL -

    xIn

    x X x

    - In

    a:

    xjI

    03 x

    0~.0 -4.00 -2.0J0 r.C 2.00 4.00 6.00INIEMiVRL M10OION15

  • -.43-

    FIGUfBE 341

    PLOTS OF EMPJIHICAL FIND)

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 36

    *EMPIMLCR*X RNCILLESXINEW71CAI.-

    LAx X

    on xs

    crcoUl

    cr

    U-n0

    x

    S-.6.00 -4. 00 -2.00 0.00 2.00 4.00 6.00

    INTERVAL MIDPOINTS

    C* EmIIFIICAL~ - LOGIST*HWTM - -- ,

    K;r

    x

    x ;

    C) X

    0x

    cc: x

    o Xx

    IN - - -----

    -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00INTEHVAL MIDPOIN15

  • -44-

    FIGUHE 35PLOTS OF EMPIRICRL RNiO

    THEORiETICRL CURVES BRSED ON

    RNCJLLES RNO LOGIST PROGRMVS

    ITEM 37

    SEPIIsCAL X RNCILLES, x

    in x

    -x

    Cn X

    XX

    xxx

    o XX_____3- ~ -

    0600 -4.0 -~00 000 p00 4.00 6.0

    INERA MDOIT-I!0

    CL LOIS

    Il) x x .. -

    --- V -

    Inms.o) 4 0 -2 l 0 - 7o .0 60

    INTEiVA MI F'IINK

  • -45-

    FIGUBE 36

    PLOTS OF EMPIRICRL AND

    THEORETICRL CURVES BASED ON

    RNCILLES AND LOGIST PROGRRMS

    ITEM 38

    EI. X RNCILLES x x1l@Ol,f ...,11CM - -

    xIn x

    X X

    r.

    0t1

    I- x- x

    .JX X

    - I-

    Co2

    x

    0 !x

    --o. oo -4.00o -e.0 0o .0 oo .0 4o , o0 C. oI NT Ef3\VL MIDPOINTS

    k _.E.. L- [0LO1ST x xIIlE@B'IIC;.. - -

    K

    I,,M.-

    cr

    a- -

    x_ --.. . . . ._ __

    . . . . -

    6.00 -4.00 -2.00 0.0) 2.00 4.00 6.00IN1EFIVRL HIlPOINT5

  • -46-

    FIGURE 37PLOT5 OF EMP'IRICAL A140

    THEORETICAL CURVES BASED ON

    RNCILLES AND LOGIST PROGRHHS

    ITEM 390

    EMPIRCAL xRNCILLE5

    x

    - I)

    K)

    x

    o X x

    C3,

    c'ec

    t-. W)

    cr x

    :)%

    a: x

    K xxK x

    KC:) K

    060 -400 2.0 0Oi' 2.0 '400 .0a)EFVR tifP(I

  • -47-

    FIGUBE 38PLOTS OF EMIPIRICAL ANO

    TH-EOFETICRL CURVES BASED ON

    ANCILLES AND LOGIST PROGRRMS

    ITEM 40

    (UPRICL *RNCILLES x

    ~e(6~ICA.x

    x

    X

    xj~x0 x

    x

    cc0-V

    -4.0 .0 200 4.0 .0

    j XX

    C) XX

    X

    XX Xxx

    crx X

    (V K

    x0-G.00 -4.00 -11.00 0.00 .00 4.00 6.00

    INICfiVHL MICIFOIHIU

  • -48-

    FIGURE 39PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES FiND LOG15T PROGRAMS

    ITEM 41

    (NPIRICAL X N IL C

    -x X

    .. J0 x

    03 XXX

    1 - 1

    - ----- -- -O S -----

    C3 XX XX

    x

    r)5

    0 x

    0 K x060 -400 2.0 0fl~ 2.0 400 .0

    xN&VLMD(IT

  • -49-

    FIGURE 40O

    PLOTS OF EMPIRICRL RNO

    TH-EORET!CRL CURVES~ RAcFf M'N

    RNCILLES RNiJ LOGIST PFROGRRMS

    ITEM 42

    - Ar~l"Ica1 - X NC ILL E5., XINEOMETICRL -

    x

    ox x

    xx x

    0 xU! x

    cED

    0-

    03x

    *~~ XNI~~~- LOS x

    Xx *IX

    xxXX

    - xx 0

    xxcr

    x~ xx

    or K

    (D xK~

    a_ xxx

    0 x0xx

    C-6. 00 -4.00 -2.00 0.00 2.00 4.,00 6.00INTERVRL MIDPOINTS

  • -50-

    FIGUIRE 41i

    PL~iS 0ii LMt-'I-ICRL RNIJ

    IHEOBETICRL CURVES BASEH ON

    RNCILLE5 RND LOGIST PROGRAMS

    ITEM 4~3

    1"*IA X x NCILLES..

    c:)

    CD'

    x

    ,-6. 00 -'4.00 -e. 00 0.00 2.00 4. 00 61.00INTEF3VRL MIDPOINT5

    C y LOGIST .

    1C- 71F - -

    Cr02

    eDxcc:

    x x

    06B.00 -4.00 - 2.00o 0.00 2.00 4. 00 6.00]NIERWfAL MIPPtOINT5

  • 7 7FIGU13E 112

    PLOTS OF EMPIRICAL AND

    THIEORETICAL CURVES BRSED ON

    RNCILLES AND LOGIST PROGRAMS

    ITEM 44L

    * (vWffiCaL X ANCILLESINcOWsIM -

    x

    C0 xx

    I-x

    Co

    fD

    OQ-f) x

    cvx0x

    0 6.00 -4.060 -2.00 0.00 2.00 14.00 6.00INTERVRL MIDPOINTS

    EWMC - LOGIST

    C)C

    U.)

    0x

    x X

    06.00 -4.*(00 -20 0.0 2'0 4.00 6.00

    INTERVRL MIDPOINTS

  • -52-

    FIGUI3E 4j3

    PLOTS OF EMPIRICAL RND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 45

    RNCILLES,

    1I~IM - -C3K

    x

    0uK

    O- K

    xJ 0

    - In

    x0 x

    V)

    c'J

    xK

    K x

    xcw ------------- 1),-6. 00 -11.00 -2.00 0.00 2.00 4.00 6.00

    INTERVRL MIDPOINTS

  • -53-

    FIGURE 411

    PLOTS OF EIIPIRJCAL RNO

    THIEORETICAL CURVES BA3ED ON

    ANCILLES RNO LOGIST PROGRRMS

    ITEM 46

    * CefIA RNC ILLES, xINEW 1ICR1.-

    In xx

    0x

    x

    inc-

    c-

    tn5

    a:xmx

    0 x

    x

    o) x

    * - r!.o 6 0 40 -. 0 00

    .0 .0 60INcaR MIPiT

    0x

    CIn

    cf

    xa:xInx

    x x

    0W

    -0r. 00 -4.00 -..0ol 0.00 1?.00 4.00 6.00INlEF-iVfL MIUPOINTS

  • -54-

    FIGUHE 45PLOTS OF EMPIRICAL AND

    THEORETICAL CURVES BASED ON

    ANCILLES AND LOGIST PROGRAMS

    ITEM 47

    [WNR X RNCILLE5

    x

    In xx

    co x

    0- U*)av5

    x

    0 x0m o

    00G.00 -4.00 -'2.00 0.00 2.00 4.00 6.00

    INIERVAL MIDPOINTS

    * (PEIRICAL - LOGIST x

    XX

    x0X ) x

    x

    crK x

    0D x Xx

    0 tn x -x

    -6. 00 -4.00 -2.00 0.00 2.00 14.00n 6.001 NT EKVAL M IOPO I tiS

  • -55-

    FIGURE 46PLOTS OF EMPIRICAL RND

    THEORETICAL CURVES BRSEO ON

    FNCILLES AND LOGIST PROGRAMS

    ITEM 48

    [WR* A Xwinni RNCILLES

    F- x

    col

    ci: xO~-Ir)

    -K Kcac

    K

    crK

    EDK

    x x

    060 -1.0 2.0 .00 2. 00 o.0 600INTERVAL MIDPOINTS

    EWRIA X LOGIST x

    C3 xLo

    1-

    Ct

    CE xcLx x K

    K K K/F)

    060o 0 -14. 00 -12. 00 0.50 2. 00 4.00 6.00

    INTEBVRL MIDPOINTS

  • -56-

    FIGURE [47PLOTS OF EMPIRICAL RNO

    THEORETICAL CURVES BASED ON

    ANCILLE5 AND LOGIST PROGRAMS

    ITEM 49

    PNCILLE5IMEW71CM

    on x

    xxXXx

    cc XX

    crKCC x

    CL x

    0C3KED

    XX X

    00x)O

    6.00 -4.00 -2.00 C0.00 2.00 4. 00 6.00

    INTERVAL MIDPOINTS

    - LOGIS AL - -

    K x

    -o K

    I- x-r Xm U,

    ED 5

    X

    ccr

    U, KX XK

    C3 K

    06.00 -4.00 -200 0.00 2.00 '.060 6.00INTERVAL MIDPOINTS

  • -57-

    FIGURE 48PLOTS OF EMP'IRICAL AND

    THEORETICRL CUR~VES BRSEQ ON

    RNCILLES RND LOGIST PROGfRAMS

    ITEMI 50

    IS~ICL -NILS,

    K

    Xox

    - x

    a:co K

    0 j

    Ls.00o -4.00 -2.00 0.00 2.00 4.00 6.00INTERV1L M~10OINTS

    C3[WN* LOGIST K

    * IIPIIICAL

    x K

    x0 x

    I-kn

    Kx x

    Q.00

    * - - ----~r-----T -i-----i------ -00 -4. 00 -2.00 0.00 2.00o 4.00o 6.00

    1WTERVH-L MIJDPOINTS

  • -58-

    Another analysis performed on the obtained chi-square values was tiplot the distributions of chi-squares obtained for the ANCILLES and LOGISTprocedures against the theoretical chi-square distribution for 45 degreesof freedom. These plots are shown in Figure 49 and Figure 50 for the ANCILLESand LOGIST procedures, respectively. From these plots it is clear that thechi-squares obtained for the ANCILLES procedure were shifted to the rightfrom the expected distribution. The LOGIST chi-square distribution was alsoshifted somewhat to the right, but not nearly so much as the ANCILLES chi-squares.

    One final analysis performed on the chi-square values was to performa chi-square tst of independence for the two procedures. That is, usingthe obtained chi-square values, items were classified as fitting or nonfit-ting for each of the two procedures. A chi-square test was then performedto test whether the classification using chi-squares for ANCILLES was indepen-dent of classification using the LOGIST chi-squares. A chi-square value of3.43 was ubtained. The critical value for., = .05 was "(1) = 3.84, so thehypothesis of independence was not rejected. There was apparently no asso-ciation in the items categorized as fitting or nonfitting between the twomiiethods of classification. This result was supported by the results of atest for the significance of a coefficient of agreement. A kappa coefficient(Cohen, 1960) was computed on the chi-square classifications, and the kappawas then converted to a z-score. A kappa equal to .228 was obtained, and az = 1.2 resulted from dividing the kappa coefficient by its standard errorof measurement (k .19). Ihe null hypothesis of no agreement was not re-jected.

    MSD Statistics

    The MSD statistics obtained for the two procedures are displayed inTable 2. he dependent t-test performed on these values showed the meanANCILLES MSD value to he significantly higher than the mean LOGIST MSD value(p .05). However, a comparison of Table 2 with Table 1 indicates that thereis no apparent relationship between the size of the chi-square values andthe MSU statistics obtained for the items for either procedure. A Pearsonproduct moment correlation was computed for the MSD and chi-square valuesand the correlations for both the LOGIST and ANCILLES procedures were foundto be not significantly different from zero (r = .12 for ANCILLES and r = .19for LOGIST).

  • -59-

    FIGURE 49

    C OBSERVED DISTRIBUTION OF

    CHI SOURRE5 FOR ANCILLESWITH EXPECTED CHI0

    .o SOURRE DISTRIBUTION

    zo.c201

    0

    C,

    C j

    Toc 3C.~ 0 5c icc rc C )1.00 5~U. 00

    oFGURE 50

    OBSERVED DISTRIBUTION OF

    CHI SQUARES FOR LOCIST

    WITh EXPECTED CHIC, SQUARE DISTRIBUTION

    0-

    rizo

    C,

    c5

    't o 30 .0 0 5 .OCI 9 0 .0 0 2 . 0 0 0s .o0 0CHI 5QUIRES0

  • -60-

    Table 2ANCILLES vs. LOGIST

    Goodness of Fit ComparisonUsing the MSD Statistic

    Item ANCILLES MSD LOGIST MSD

    1 --- .2332 .202 .1983 .179 .1734 .186 .1865 .158 1586 .160 .1597 .178 .1768 .212 .2139 --- .191

    10 .209 .20811 .225 .22612 .181 .17913 .195 .19514 .215 .21515 .156 .15916 .191 .19217 .209 .21018 .220 .22019 .185 .18420 .194 .19421 .222 .22322 .201 .20323 .192 .19024 .228 .?2q25 .206 .20526 .191 .19?27 .207 .20828 .209 .20929 .220 .22030 .199 .19931 .201 .20332 .202 .20233 .161 .15434 .213 .21336 .197 .19636 .181 .18237 .185 .18238 .155 .15039 .215 .21440 .211 .21241 .217 .21842 .190 .19143 .161 .15944 .113 .10245 .167 .16046 .166 .15747 .207 .20848 .200 .19849 .216 .21750 .204 .207

    t(47) = 2.15 .194 (p< .05) .193

    Note: The critical value of t(47) = 2.014 for * = .05.

  • -61-

    Parameter Estimate Distribution Analyses

    Item Parameter Estimates The item parameter estimates obtained fromANCILLES and LOGIST are shown in Table 3. The correlations of the two setsof estimates are displayed in Table 4. Because the origin and unit of meas-urement used for the ability and item parameter estimates are arbitrary, thescales used for the two sets of estimates are different. Therefore, to facili-tate this comparison the ANCILLES estimates were put on the same scale as theLOGIST estimates using procedures set out by Marco (1977). The scaled ANCILLESa- and b-values are presented in Table 5. Scaling does not alter the c-values.The values obtained for the a- and b-values were similar, with the a-valueshaving a correlation of r = .85, and the b-values having a correlatTon ofr = .97. The c-values were less similar, having a correlation of r = .51.

    The distributions of the item parameter estimates obtained from LOGISTand the scaled ANCILLES estimates are described in Table 6. Although the ob-tained estimates were highly correlated the statistics shown in Table 6 in-dicate that there were differences in the item parameter estimate distributions.The a-value distributions appear quite similar. However, a dependent t-testindicated that the mean ANCILLES a-value (.53) was significantly lower thanthe mean LOGIST a-value (.61), yielding a t = 3 .91 ( < .01). A test forthe significance-of the difference between-correlated variances (Ferguson,1976) yielded a t = 8.68, indicating that the variance of the LOGIST a-valueswere significantTy greater than the variance of the ANCILLES a-values Tp < .01).Whenever variances were found to be unequal in this study, means were testedfor significant differences using the correction in the degrees of freedomset out by Welch (1938). A test for whether the obtained kurtosis values(-.85 for LOGIST, -.72 for ANCILLES) were significantly different from zero(Snedecor and Cochran, 1967) indicated that neither value was significant, aswas the case with a test for skewness (Snedecor and Cochran, 1967).

    A dependent t-test applied to the b-value means (-.06 for ANCILLES, -.34for LOGIST) yielded a t = 1.97, indicating that mean ANCILLES b-value wasgreater than the mean LOGIST b-value (p < .05). A test for the significanceof the difference between correlated variances yielded a t = 6.63, indicatingthat the variance of the LOGIST b-values (p < .01). The greater variance ofthe LOGIST b-values becomes more evident when the range of values is consid-ered. The scaled ANCILLES b-values ranged only from -2.88 to 2.34 ( a rangeof 8.34). The kurtosis value for LOGIST (12.21) was significant (f < .01),while the kurtosis for ANCILLES (.65) was not. However, the LOGIST b-valueswere significantly negatively skewed (p < .01) indicating that, although LOGISTb-values go much lower than did ANCILLES, the bulk of the LOGIST b-values wereactually above the mean of -.34. The ANCILLES b-values were not significantlyskewed.

  • -62-Table 3

    ANCILLES and LOGIST Item Parameter Estimates

    ANCILLLS LOGISTItem No.

    a. b. c. a. b, c.I I I I

    . 15 -1.66 .042 .80 1.26 17 .84 .97 .123 .85 1.22 .11 .95 .94 .064 .98 .04 .14 1.00 .09 .175 .51 -1.92 .06 .29 -2.84 .04

    .48 -2.13 .06 .26 -3.07 .047 .77 .69 .03 .92 .64 .048 .69 .17 .15 .47 -.23 .049 .08 -7.07 .04

    10 .37 -1.37 .03 .25 -1.76 .04II .41 .00 .04 .36 -.08 .0412 .92 .78 .09 .90 .61 .0413 .57 -.75 .06 .40 -1.11 .0414 .54 -.21 .01 .44 -.25 .0415 1.14 -.40 .17 .68 -.90 .041b .76 -.27 .02 .61 -.32 .0417 .58 -.30 .01 .46 -.36 .0418 .40 -.74 .02 .30 -.89 .0419 .86 .56 .07 .86 .47 .0420 .86 .48 .10 .75 .33 .0421 .55 .16 .16 .36 -.37 .0422 .50 -.78 .08 .33 -1.28 .0423 .44 -1.46 .03 .29 -1.92 .0424 .40 -.24 .02 .31 -.29 .0425 .99 .62 .22 1.06 .55 .2026 .89 .25 .09 .74 .11 .0427 .69 .16 .08 .54 .00 .0428 .46 -.87 .02 .33 -1.12 .0429 .51 -.09 .03 .41 -.14 .0430 .74 .16 .05 .66 .11 .0431 .64 -.37 .02 .49 -.46 .0432 .62 .44 .01 .62 .47 .0433 .97 .81 .05 1.13 .65 .0134 .54 -.40 .02 .41 -.49 .0435 .61 .63 .01 .65 .64 .0436 .85 -.30 .02 .68 -.38 .0437 .92 .63 .09 .88 .48 .0438 .87 .90 .02 1.04 .73 .0039 .57 .55 .07 .49 .44 .0440 .59 -.16 .01 .48 -.18 .0441 .53 -.07 .01 .44 -.07 .0442 .61 -.78 .02 .43 -1.03 .0443 1.11 .37 .04 1.12 .30 .0144 .72 1.90 .03 k17 1.27 .0045 .85 1.20 .08 1.08 .94 .0646 .92 .97 .07 1.18 .77 .0447 .71 .14 .11 .53 -.11 .0448 .70 .89 .10 .63 .69 .0449 .53 -.26 .01 .41 -.31 .0450 .72 -.01 .20 .44 -.62 .04

    Note. ANCILLES deleted items I and 9 during calibration.

  • -63-

    Table 4

    ANCILLES and LOGIST Item ParameterEstimate Correlations

    ANCILLESLOGIST

    a b c

    a .85 .79 .25

    b .56 .97 .14

    c .22 .07 .51

    Note: Sample size for both ANCILLES and LOGIST is n = 48.

    There were some differences in the distributions of c-values, with themean ANCILLES c-value significantly higher than the mean -OGIST c-value.However, the actual obtained c-values for the two procedures did-not differgreatly in magnitude. For instance, a difference in mean c-values of .02,although significant (p < .01), does not seem to be a great difference. Theskewness of both distributions (1.14 for ANCILLES, 3.34 for LOGIST) was sig-nificant (p , .01 for both), but the ANCILLES c-value kurtosis (.60) was notsignificant, while the kurtosis for LOGIST (131.-07) was significant (P_ < .01).

    When the item parameter estimates obtained from LOGIST for the two itemsdeleted by ANCILLES are dropped and the comparisons are made only on the 48items in common, the descriptive statistics change somewhat. The LOGISTmean b-value increases to -.17 without those two items, and the b-valuestandard deviation drops to .93. The minimum b-value increases to -3.07,the skewness changes to -1.224, and the kurtosis becomes 1.891. Thus, with-out those two items the b-value distributions from LOGISF and ANCILLES areeven more similar. The a-value distributions, however, become slightly lesssimilar when only the 48 common items are considered. The mean a-value forLOGIST becomes .63. This new value slightly increases the difference in thetwo distributions, as does the new kurtosis value of -.96 and the new skew-ness value of .56. The new standard deviation (.28) is slightly closer tothe PNCILLES value, as is the new minimum a-value of .25. The only changesin the LOGIST c-value distribution are to the skewness and kurtosis values,which become 3.26 and 12.42, respectively.

  • -64-

    Table 5ANCILLES Item Parameters Transtormed to the LOGIST Parameter Scale

    Itemi No. a. b.

    1

    2 .62 1.523 .65 1.474 .75 -. 065 .39 -2.616 .37 -7 .59 .786 .53 .119. . .. .

    10 .28 -1.9011 .36 -. 1112 .71 .9013 .44 -1.0914 .42 -. 3915 .88 -. 6316 .58 -. 4617 .45 -. 5018 .31 -1. 0619 .66 .6220 .66 .5121 .42 .1022 .38 -1. 1323 .34 -2.0124 .31 - .4225 .76 .6926 .68 .2127 .53 .1028 .35 -1.2429 .39 -. 2330 .57 .1031 .49 -.5932 .48 .4633 .75 .9434 .42 -. 6335 .47 .7136 .65 -. 5037 .71 .7138 .67 1.0639 .44 .6040 .45 -. 3241 .41 -. 2042 .47 -1. 1343 .85 .3744 .55 2.364 65 1 .4546 71 1.1547 55 .074 !4 .1.044) .41 - .45

    .55 . 13

    Not(,: ihe traw fo(Iriiiti nn (u10, not, iltor the (-value,.

  • -65-

    Table 6

    ANCILLES and LOGIST Item Parameter Estimate Descriptive Statistics

    ANCILLES LOGIST

    Statistic- ..

    a i bi ci ai bi ci

    No. of Items 48 48 48 50 50 50Mean .51 -.06 .06 .61 -.34 .04Median .53 -.06 .05 .49 -.14 .04St. Dev. .15 1.06 .06 .30 1.34 03Minimum .29 -2.88 .01 .08 -7.07 .00Maximum .88 2.34 .22 1.18 1.27 .20Skewness .36 -.49 1.14 .47 -2.89 3.34Kurtosis -.72 .65 .60 -.85 12.21 13.07

    Note: Statistics for ANCILLES were obtained using transformed item parameterestimates.

    Another analysis performed on the obtained item parameter estimates wasto compare the estimates obtained for those items showing lack of fit to theestimates obtained for those items not showing lack of fit. The estimatesfor which there was significant lack of fit are shown for ANCILLES in Table 7and LOGIST in Table 8. Examination of these tables does not give any clearinailation as to the cause of the lack of fit. The a-values of the items forwhich there was lack of fit for ANCILLES have a mean not significantly dif-ferent from the wean of the items not showing lack of tit. The ANCILLES meanb-value for the items showing lack of fit is not significantly lower than themean b-value for the items not showing lack of fit. For the items for whichthere was lack of fit for LOGIST the mean a-value is significantly lower thanthe mean of the a-values for items not showing lack of fit. The mean b-valuefor the items with lack of fit is not significantly lower than the mean b-value for the poorly titting items are not significantly different from thec-values for the other items for either procedure. A comparison of the itemparameter estimates for the poorly fitting items across the two proceduresindicates that there were no significant differences in the means of any ofthe parameter estimates.

  • -66-

    Table 7

    ANCILLES Item Parameter Estimates for Items for WhichThere Was Significant Lack of Fit

    Iteiii a. b. c.~1 1 1

    2 .62 1.52 .175 .39 -2.61 .066 .37 -2.88 .06

    1?. .44 -1.09 .06i 58 -.46 .02

    ..45 -.50 .01i .31 -1.08 .0223. 34 -2.01 .0327a .53 .10 .0829 .39 -.23 .0333 .7t .94 .0542a .47 -1.13 .0244 .55 2.36 .0345 .65 1.45 .0846 .71 1.15 .07

    Lack of x .50 -.30 .05Fit St. Dev. .14 1.56 .04

    No Lack x .55 .05 .07of Fit St. Dev. .16 .73 .06

    d Also showed lack of fit for LOGIST.

  • -67-

    Table 8

    LOGIST Item Parameter Estimates For Items ForWhich There Was Significant Lack of Fit

    Item a i b i c i

    17a .46 -.36 .0422 .33 -1.28 .04

    2 3a .29 -1.92 .0427a .54 .00 .0434 .41 - .49 .044 2d .43 -1.03 .04

    Lack of x .41 -.85 .04Fit St. Dev. .09 .70 .00

    1 o Lack x .63 -.27 .05of Fit St. Dev. .30 1.40 .04

    a Also snowed lack of fit for ANCILLES.

    A bi ) i _ty E t -If ;a-t es

    The final ,et of analyses performed involved the comparison of the abilityestimates obtained from LOGIST with the scaled ANCILLES ability estimates. Des-,rriptive totaihs tur the tio obtained ability estimate distributions arepre entuJ ir, Ta: .I an be ,een from these statistics the two distributions.were quite i I Tr " ran.qe of ability estimates for LOGIST was limited by:0 o 7 JaYIP, U t a ,p,, atI. , -4.0) to +4.00. In unrestricted operation LOGIST, li l,j,, j ,l r i,, it ot aiti Ity estimates than would ANCILLES (the same

    ,' U .j ,rT ,P,' , i r fin and variance of b-values).

    a l e L)

    !i1)t, [ .timate Descriptive Statistics

    ANiLIL LLS LOGIST

    ,0. i ', . -3 9 1999Mor -. 137 -. 137

    S.045 .142. ev. 1.213 1.214

    Mii ilw -4.991 -4.061Mahu h 3.303 3.432

    706 -1.164tr' WI1 .398 1 .372

    , t, 1tati ti.', for ANCILLES were obtained using transformed ability estimates.

  • -68-

    The frequency distributions of the ANCILLES and LOGIST ability esti-mates were plotted together. These frequency distributions are shown inFigure 51. As can be seen in the figure, the two distributions are almostindistinguishable inside the range of -2.00 to +2.00. The only real discrep-ancy between the two distributions is the height of the LOGIST curve at about-4.00. Because of the arbitrary limits on u, LOGIST tends to 'pile up' atthe limit those examinees whose ability estimates would be outside the limitif the limit were not imposed. This accounts for an unusually large numberof ability estimates at approximately -4.00. The great similarity betweenthe two sets of ability estimates is reflected in the correlation of theability estimates. The Pearson product-moment correlation coefficient ob-tained for the ability estimates was r = .987. Clearly there is a strongassociation between the ability estimates assigned by LOGIST and those as-signed by ANCILLES.

  • -69-

    FIGURE 51

    FREQUENCY DISTRIBUTIONS OF

    OBTAINEO ABILITY ESTIMATES

    FOR ANCILLES AND LOGIS5C

    ,llI

    • _ ANCILLESA

    Z C3

    Dci LO0G 15T=

    C&C3

    c)

    N m

    * NI

    d- N

    mN

    MA

    .,~ ~~ ~~~ -] ".. . .; .0 2.00 3.00 4,00H ET F,

    Ai

  • -70-

    Discussion

    When using a Pearson statistic to test the goodness of fit of datd toa model such as the 3PL model, a number of difficulties are encountered.before discussing the results of this study, these problems will be addressedand the manner in which they were dealt with in this study will be discussed.

    One of the first problems to arise when attempting to compute a chi-square statistic such as was used in this study concerns the formation ofintervals on the ability estimate scale. There appears to be some questicnas to how many intervals to form. for instance, Yen (in press) suggests10 intervals, while Wright and Mead (1977) recorlend six or fewer intervals.The statistic proposed by Wright and Panchapakesan (1969) would require asmany intervals as there are obtained numer-right score,,. Buck (1972), inthe fit statistic he has proposed, does not set out any requirenet.ts as tothe number of categories, but in the example he sets out in his paper (pp. 44-45)he uses 10 intervals. It is clear that the size .)t the interval will af-fect the size of the chi-square obtained for the interval. As the intervalwidth increases, the difference between the observed proportions at tne endsof the interval and the expected proportion at the center of the intervalcan be expected to increase. The objective, then, is to have enough inter-vals (making each interval smaller) to produce sufficiently small within-interval variances in the ability estimates, and thereby reducinq within-cell variances of the expected proportions. Alternatively,'Jp. can be cot-

    jputed and subtracted from the denominator of the chi-square statistic (Wrigntana Mead, 1977).

    In the current study 48 intervals were used. With such a large numbe,of intervals the width of any one interval was sufficiently small as to ob-viate the need to correct for the variation in expected proportions. How-ever, using such narrow intervals did result in very low frequencies withinthe extreme intervals, with several intervals having frequencies equal tozero. In order to correct for the small frequencies in the extreme inter-vals some of the intervals were collapsed together and treated as a singlecategory.

    Another problem encountered in applying a chi-square test is the de-termination of the appropriate degrees of freedom. The degrees of freedomnormally associated with the chi-square goodness of fit test when parametersare estimated from the data is

    df = r - g - 1 (7)

    where df is the degrees of freedom, r is the number of categories, and gis the number of parameters estimated from the data (Daniel, 1978). Thatis, the degrees of freedom are calculated as the number of independent datapoints (observed proportions) tiinus the number of independent parametersestimated from the data to produce the expected proportion (Yen, ir press).However, when applying the chi-square test to a latent trait model several

    changes are required. first, because the sum of the expected frequenciesis not held fixed, it doesn't really make sense to subtract one from thenumber of categories. Thus there are r independent data points, rather than

  • -71-

    r - I (Yen, in press). For the 3PL model there are four independent para-meters (6, a, b, and c) estimated from the data and used in computing theexpected proportions. The item characteristic curve for an item is fairlywell defined by the computed observed proportions, and the item parameterestimates are clearly dependent on the observed proportions. Therefore,one degree of freedom should be subtracted for each item parameter. How-ever, the ability estimates obtained were dependent upon the entire re-sponse vector, and a given item contributes only a small proportion of theinformation necessary to compute the ability estimates. Therefore, for anygiven item the estimation of ability entails little loss in degrees of free-dom (Yen, in press). Therefore, it is probably more appropriate to subtractg - 1 from the degrees of freedom, rather than g, when using a latent traitmodel. The degrees of freedom used for this study, then, are given by

    df = r - (g - 1) (8)

    where df, r and g are as defined above.

    Chi-Square Analyses

    It is clear from the results of the chi-square analyses that the LOGISTprocedure performed better in terms of goodness of fit. Neither procedureactually fit the test as a whole, but fewer items were rejected when usingLOGIST. For the LOGIST procedure only twelve percent of the items showedlack of fit, while for the ANCILLES procedure over thirty percent of theitems were rejected for lack of fit.

    It is difficult to determine why the lack of fit was significant forANCILLES more than for LOGIST, especially considering that in almost halfof tne cases (23 out of 48) the LOGIST chi-square was larger than the ANCILLESchi-square. The plots of the expected and observed proportions correct arenot very revealing either. However, an examination of the chi-square val-ues obtained for each interval, before being summed, does give some insightas to the cause of the poor fit. For Items 17 and 27 the LOGIST chi-squareswere significant due solely to the poor fit in the most positive category,as was the case for Items 16, 17, 18, and 27 for ANCILLES. The last cate-gory on the positive end was a very wide category, due to collapsing. Be-cause of this the computed expected proportion, based on the midpoint of theinterval, was too high. For Item 27 of ANCILLES, as well as Items 6, 23,33, 42, 44, and 45, the poor fit was concentrated in the intervals above

    = I.UO. The same was true for Item 23 for LOGIST. For LOGIST, Items 22,34, and 42 seemed to fit poorly across the ability range, as was the casewith ANCILLES for Items 2, 5, 13, and 46. These findings are sunmnarized inlable 10. The poor fit at the extreme ends of the ability range was a pro-blem with both procedures. The poor fit in the most positive interval wasa procedural problem, and those items should probably not be counted amongthose items for which there was significant lack of fit. Without those itemsthere was significant lack of fit for four items for LOGIST and 11 itemsfor ANCILLE%.

  • -72-

    Table 10

    A Summary of the Ranges of Ability for WhichItems Showed Poor Fit for ANCILLES and LOGIST

    Procedure Last Interval Interval where > +1.0

    ANCILLES 16, 17, 18, 27 6, 23, 27, 33, 42, 44

    LOGIST 17, 27 45, 23

    MSD Sttistic

    An examination of the obtained MSD statistics contributes little towardexplaining the results. The dependent t-test on these values was signifi-cant, which is not consistent with the finding that the ANCILLES chi-squareswere not larger than the LOGIST chi-square significantly more than half thetime. Moreover, it is disturbing that there was apparently no relationshipbetween the size of the MSD statistics obtained for the items and the sizeof the chi-square values for the items. A comparison of the MSD values andthe item parameter estimates did not yield any clear pattern.

    Item Parameter Estimates

    A comparison of the item parameter estimates obtained from LOGIST andthe transformed ANCILLES estimates also failed to yield a clear explanation.For the full set of items the ANCILLES and LOGIST mean b-values were not sig-nificantly different. They were also not significantly different for thoseitems for which there was lack of fit, nor were they significantly differentfor those items for which there was no lack of fit. For neither procedurewas the mean b-value obtained for the items for which there was lack of fitdifferent from the mean b-value for the items for which there was not lackof fit.

    The mean a-values for ANCILLES and LOGIST were significantly different.Interestingly enough, however, the mean a-values were not significantly dif-ferent when consideri g only those items for which there was lack of fit,nor were they significantly different when considering only the items forwhich there was not lack of fit. The ANCILLES mean a-value for the itemsfor which there was lack of fit was not significantly different from themean ANCILLES a-value for the items for which there was not lack of fit.However, for LUGIST the men a-value for the items for which there was lackof fit was significantly lower than the mean LOGIST a-value for the rest ofthe items. Because LOGIST yielded higher a-values than ANCILLES for the full

  • -73-

    set of items but not for those items for which there was lack of fit it ispossible that LOGIST underestimated the a-values for those items for whichthere was lack of fit. It did appear that LOGIST had more trouble with itemswith lower discrimination values.

    For the full set of items the mean ANCILLES c-value was significantlyhigher than the LOGSIT mean c-value. The mean ANCILLES c-value for the itemsfor which there was not lack of fit was also greater than the mean LOGISTc-value for the items for which there was not lack of fit. However, whenconsidering only those items for which there was lack of fit, the mean c-values for the two procedures were not significantly different, indicatingperhaps that for the items for which there was lack of fit either ANCILLESunderestimated the c-values, or LOGIST overestimated the c-value, or both.However, fur neither procedure was the mean c-value for the items for whichthere was lack of fit significantly different from the mean c-value for therest of the items.

    The comparisons of means discussed above do not yield any clear pattern.A comparison of the estimates obtained from ANCILLES and LOGIST with the chi-squares obtained for the procedures does indicate a consistent pattern, how-ever. While it is true that comparing mean values reveals surprisingly fewdifferences in the two sets of item parameter estimates, there is some evi-dence that the lack of fit of the ANCILLES procedure is related to the itemparameter estimates. The correlation of the ANCILLES b-values with the chi-squares obtained for ANCILLES is r = -.49. When using the absolute value ofthe b-values, that correlation is r = .68, indicating that the size of thechi-square value obtained for ANCILLES was strongly related to the absolutemagnitude of the corresponding b-value. While the mean ANCILLES b-value forthe items for which there was lack of fit was not significantly dTfferentfrom the mean for the rest of the items, the variance of the b-values forthe items for which there was lack of fit, s2 = 2.43, was significantly high-er than the variance of the b-value of the rest of the items, s2 = .53 (P < .001).This indicates that the b-values for the items for which there-was lack offit were more extreme than the b-values of the rest of the items. This dif-ference wasn't indicated by the comparison of the means because the extremevalues were divided between the positive and negative ends, thus cancellingthemselves out when the nan was computed. This pattern does not occur withLUGIST, and the correlation of the LOGIST chi-squares with the absolute val-ues of the LOGIST b-values was r = 0.0. It appears, then that at least partof the difference Uetween the fTt of the two procedures is accounted for bythe poorer ability of ANCILLES to handle extreme b-values.

    The correlations of the obtained chi-squares for the two procedureswitti their respective a- and c-values were not significant. However, a-valueestimates also appeared to be-a factor in the fit of the LOGIST procedures.For instance, for Item 23 the fit of the model to the data for LOGIST waspoorest at the extremes of the ability range. The a-value for Item 23 ob-tained from LOGIST was a = .29, a relatively low discrimination. The a-valuesfor the remaining nonfitting LOGIST items were also low.

    Most of the items for which there was poor fit can be accounted for inone of the following ways. For three items for ANCILLES and two items for

  • -74-

    LOGIST the poor fit was due to a procedural problem. For the remaining itemsfor LOGIST the poor fit appears to be due to the poor handling of low dis-crimination values. However, since low discrimination values often indicatemultidimensionality, poor fit would be a desired result in these cases (Reckase,1978). For nine of the remaining 11 items for ANCILLES for which there waslack of fit, the poor fit appeared to be primarily due to the inability ofANCILLES to handle extreme difficulty values. For one of the two remainingitems for ANCILLES, Item 29, the poor fit seemed to be across the abilityrange. Item 29, however, had a low discrimination value, indicating that per-haps ANCILLES also does not handle low discriminators well. For Item 33 thepoor fit of ANCILLES was primarily in the intervals where t > +1.0.

    Ability Estimates

    As was indicated by Table 9, the ability estimate distributions obtainedfrom ANCILLES and LOGIST were almost identical. Considering the similarityand the fact that the two sets of ability estimates had a correlation ofr = .987, it is difficult to imagine how the ability estimates could have beena factor in the difference in fit for the two procedures.

    Summary and Conclusions

    This study was conducted to determine whether there were qualitativedifferences in the parameter estimates obtained from the ANCILLES and LOGISTestimation procedures. The comparison was made using goodness of fit as acriterion. The results of this study indicate that there are qualitativedifferences in the estimates obtained from these two procedures. While theparameter estimate distributions obtained from these two procedures werequite similar-, lack of fit occurred for significantly more items for ANCILLESthan for LOGIST. Further analyses indicated that lack of fit for ANCILL.>appeared to be strongly related to item difficulty, while for LOGIST lack offit was more closely related to item discrimination. It is true that LOGISTis more expensive to use than ANCILLES, but ANCILLES yielded lack of fitsignificantly more often than LOGIST, and did not yield item parameter esti-mates for two items. Because of this LOGIST appears to be the procedure ofchoice.

  • -75-

    REFERENCES

    Birnbaum, A. Some latent trait models and their use in infering an exam-inee's ability. In F. M. Lord and M. R. Novick, Statisticaltheories of Mental test scores. Reading, MA: Addison-Wesley,1968.

    Bock, R. D. Estimating item parameters and latent ability when responsesare scored in two or more nominal categories. Psychometrika, 1972,37, 29-51.

    Daniel, W. W. Applied nonparametric statistics. Boston: Houghton Mifflin, 179.

    Divgi, D. R. A nonparametric test for comparing goodness of fit in latent traittheory. Paper presented at the Annual Meeting of the AmericanEdicational Research Association, Boston, April, 1980.

    Ferguson, G. A. Statistical analysis in psychology and education. New York,McGraw-Hill.

    Marco, G. L. Item characteristic curve solutions to three intractable test-ing problems. Journal of Educational Measurement, 1977, 14, 139-160.

    Reckase, M. D. Ability estimation and item calibration using the one- andthree-parameter logistic models: A comparative study (ResearchReport 77-1). Columbia: University of Missouri, Department ofEducational Psychology, November 1977.

    Reckase, M. D. Unifactor latent trait models applied to multifactor tests:Results and implications. Journal of Educational Statistics, 1979,4(3), 207-230.

    Ree, J. M. Estimating item characteristic curves. Applied PsychologicalMeasurement, 1979, 3, 371-385.

    Rentz, R. R., and Bashaw, W. L. Equating reading tests with the Rasch model:Final report. Athens, GA: Educational Research Laboratory,University of Georgia, September, 1975.

    Snedecor, G. W., and Cochran, W. G. Statistical Methods (6th ed.). Ames, IA:Iowa State University Press, 1967.

    Urry, V. W. A Monte Carlo investigation of logistic mental test models.(Doctoral dissertation, Purdue University, 1970). DissertationAbstracts International, 1971, 31, 6319B. (University MicrofilmsNo. 71-9475).

  • -76-

    Urry, V. W. Tailored testing: A spectacular success for latent trait theory.Springfield, VA, National Technical Iformation Service, 1977. (aY

    Urry, V. W. OGIVIA: Item parameter estimation program with normal ogiveand logistic three-parameter model options. Washington, D. C.:U. S. Civil Service Connission, Personnel Research and DevelopmentCenter, 1977. (b)

    Urry, V. W. ANCILLES: Item parameter estimation program with normal ogiveand logistic three-parameter model options. Washington, D. C.:U. S. Civil Service Cormission, Personnel Research and DevelopmentCenter, 1978.

    Welch, B. L. The significance of the differences between two means when thepopulation variances are unequal. Biometrika, 1938, 29, 350-362.

    Wood, R. L., Wingersky, M. S., and Lord, F. M. LOGIST: A computer programfor estimating examinee ability and item characteristic curve para-meters (ETS Research Memorandum RM-76-6). Princeton, NJ: Educat-ional Testing Service, June, 1976.

    Wright, B. D., and Mead, R. J. BICAL: Calibrating items and scales with theRasch model (Research Memorandum No. 23). Chicago: StatisticalLaboratory, Department of Education, University of Chicago, 1977.

    Wright, B. D., and Panchapakesan, N. A procedure for sample-free item analy-sis. Educational and Psychological Measurement, 1969, 29, 23-48.

    Yen, W. M. Using simulation results to choose a latent trait model. Appl'edPsychological Measurement, in press.

  • DISTRIBUTION LIST

    Navy

    1 Dr. Norman J. KerrDr. Jack R. Borsting Chief of Naval Technical TrainingProvost & Academic Dean Naval Air Station Memphis (75)U.S. Naval Postgraduate School Millington, TN 38054Monterey, CA 93940

    Dr. Robert Breaux 1 Dr. William L. MaloyCode N-711 Principal Civilian Advisor for

    NAVTRAEQUIPCEN Education and Training

    Orlando, FL 32813 Naval Training Command, Code OOAPensacola, FL 32508

    Chief of Naval Education and TrainingLiason Office 1 Dr. Kneale Marshall

    Air Force Human Resource Laboratory Scientific Advisor to DCNO(MPT)Flying Training Division OPOTWILLIAMS AFB. AZ 85224 Washington DC 20370

    Dr. Richard Elster 1 CAPT Richard L. Martin, USNDepartment of Administrative Sciences Prospective Commanding Officer

    Naval Postgraduate School USS Carl Vinson (CVN-70)

    Monterey, CA 93940 Newport News Shipbuilding and Drydock CoNewport News, VA 23607

    DR. PAT FEDERICONAVY PERSONNEL R&D CENTER 1 Dr. James McBrideSAN DIEGO, CA 92152 Navy Personnel R&D Center

    San Diego, CA 92152Mr. Paul FoleyNavy Personnel R&D Center 1 LibrarySan Diego, CA 92152 Naval Health Research Center

    P. 0. Box 85122Dr. John Ford San Diego, CA 92138Navy Personnel R&D CenterSan Diego. CA 92152 1 Naval Medical R&D Command

    Code 44Dr. Henry M. Halff National Naval Medical CenterDepartment of Psychol.ogy.C-009 Bethesda, MD 20014University of California at San DiegoLa Jolla, CA 92093 1 Ted M. I. Yellen

    Technical Information Office, Code 201Dr. Patrick R. Harrison NAVY PERSONNEL R&D CENTERPsychology Course Director SAN DIEGO, CA 92152LEADERSHIP & LA',4 DEPT. (7b)DIV. OF PPOFES...AL rEVELOPMMENTU.S. NAVAL ACAiEEMY 1 Library, Code P2O1LANNAPOLIS, MD 21402 Navy Personnel R&D Center

    San Diego, CA 92152

    CDR Robert S. Kennedy 6 Commanding OfficerHead, Human Performance Sciences Naval Research LaboratoryNaval Aerospace Medical Research Lab Code 2627Box 29407 Washington. DC 20390New Orleans, LA 70189

  • Psychologist 1 Dr. Alfred F. SmodeONR Branch Office Training Analysis & Evaluation GroupBldg 114, Section D (TAEG)666 Summer Street Dept. of the NavyBoston, MA 02210 Orlando, FL 32813

    Psychologist 1 Dr. Richard SorensenONR Branch Office Navy Personnel R&D Center

    536 S. Clark Street San Diego, CA 92152Chicago, IL 60605

    1 Dr. Ronald WeitzmanOffice of Naval Research Code 54 WZCode 437 Department of Administrative Sciences

    800 N. Quincy SStreet U. S. Naval Postgraduate ScolArlington, VA 22217 Monterey, CA 93940

    5 Personnel & Training Research Programs 1 Dr. Robert Wisher

    (Code 458) Code 309

    Office of Naval Research Navy Personnel R&D CenterArlington, VA 22217 San Diego, CA 92152

    Psychologist 1 DR. MARTIN F. WISKOFFONR Branch Office NAVY PERSONNEL R& D CENTER1030 East Green Street SAN DIEGO, CA 92152Pasadena, CA 91101

    Office of the Chief of Naval Operations Army

    Research Develo ent & Studies Branch(OP-i115)

    Washington DC 20350 1 Technical DirectorU. S. Army Research Institute for ,LT Frank C. Petho, MSC, USN (Ph.D) Behavioral and Scial SciencesCode L51 5001 Eisenhower Avenue

    Naval Aerospace '-1edical Research Labora' Alexandria, VA 22333Pensacola, FL 32508

    1 DR. RALPH DUSEK

    Dr. Bernard Rinlard (03B) U.S. ARMY RESEARCH .. S. TUTENavy Personnel P') Center 5001 EISENH,,E, AVEXElSan Diego, CA 92152 ALEXANDRIA, VA 22333

    Dr. Worth Scanand 1 Dr. Myron Fischl

    Chief of Naval Education and Training U.S. Army Research Institute fo, th,

    Codeo Na Social and Behavioral Sc e,c-Code N-s 5001 Eisenhower AvenueNAS, Pensacola, FL 32508 Alexandria, VA 2231

    Dr. Robert G. Smith 1 Dr. Dexter FletcherOffice of Chief of Naval Operations U.S. Army Research InstituteOP-987H 5001 Eisenhower AvenueWashington, DC 20350 Alexandria,VA 2233

  • F

    Dr. Michael Kaplan 1 Dr. Earl A. AlluisiU.S. ARMY RESEARCH INSTITUTE HQ, AFHRL (AFSC)5001 EISENHOWER AVENUE .Brooks AFB, TX 78235ALEXANDRIA, VA 22333

    1 Research and Measurment DivisionDr. Milton S. Katz Research Branch, AFMPC/MPCYPRTraining Technical Area Ra