Download - The Rasch model as a statistical model: Erling B ...publicifsv.sund.ku.dk/~kach/ChristensenRasch2010.pdf · Probabilistic models for measurement Copenhagen 2010 The Rasch model as

Probabilistic models for measurement Copenhagen 2010

The Rasch model as a statistical model:

Erling B. Andersens contributions to the theory of Rasch

models

Karl Bang Christensen

Svend Kreiner

Univ. of Copenhagen

presentation availabe from http://staff.pubhealth.ku.dk/~kach/.

1


Rasch ModelOrdinal items X1, . . . , XI measuring latent variable θ

Pr(Xi = 1|θ) =exp(θ − βi)

1 + exp(θ − βi)=

ξδi

1 + ξδi(1)

data X = (Xvi), values θ1, . . . , θN , scores Xv. =∑

i Xvi.

MATHEMATICS

RM

STATISTICS PHILOSOPHY

Expression of a comparison (θ − βi). Developed 1955-1960.

Rasch. Probabilistic Models for some Intelligence and Attainment Tests. Copenhagen:Danish National Institute for Educational Research 1960

2


Erling B. Andersen

studied with Rasch 1961-1963.

first Danish graduate with formal degree in mathematical statistics 1963

gold medal from Univ. of Copenhagen for work on Rasch model 1965

first elected chairman of the Danish Soc. for Theor. Statistics 1971 (?)

defended his doctoral thesis on conditional inference 1973.

took over Rasch’s chair at Dep. of Statistics, Univ. of Copenhagen 1974.

editor of Scand. Journal of Statistics 1974-1986.

remembered as an excellent lecturer

his 1980 book is a classic still well worth reading today.

(?): Reasons for forming society at that time: SJS + Rasch’s 70th birthday in sept. 1971.

Andersen. Discrete stat. models with social science appl. Amsterdam: North-Holland, 1980.

4


Consistency - better estimates with more observations

(Xj)j=1,...,n normal variables, E(Xj) = µ and V (Xj) = σ2

MLE µ̂ = 1n

∑j xj consistent: µ̂→ µ as n→∞

MLE fails under random sampling from non-identical distr.

(Xij)i=1,...,I,j=1,...,n, with E(Xij) = µi and V (Xij) = σ2

µ̂i consistent, but σ̂ is inconsistent: σ̂ → k−1k σ as n→∞.

General formulation by Neyman & Scott

Neyman & Scott. Consistent estimates based on partially consistent observations.Econometrika 1948, 16:1-32.

8


Conditional inference

Conditioning yields a sequence of independent random variables.

X1|T1 , X2|T2 , . . .

Using the conditional distribution given the sum scores yields

CML. The estimates are consistent and asymptotically normally

distributed.

Exact test known since 1925. Results on UMPU tests 1937.

Bartlett. Properties of suff. and stat. tests. Phil. Trans. Royal Soc. A 1937, 160:268-82.

Fisher. Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd, 1925

Neyman. Outline of a Theory of Statistical Estimation Based on the Classical Theory ofProbability. Phil. Trans. Royal Soc. A 1937, 236:333-80.

9


Writing δ = (δ1, . . . , δI), X = (X1, . . . , XI), x = (x1, . . . , xI) it

turns out that we can do estimation using

LC(δ) =n∏

v=1

LC(δ;xv) (2)

where

LC(δ;x) = P (X = x|X. = t) =

∏i δ

xii

γt(δ)(3)

does not dependend on θ, symm. polynomials γ. Asymptotic

theory. Likelihood ratio tests performed in the standard way.

Andersen. Asymptotic Properties of Conditional Maximum-Likelihood Estimators. Journalof the Royal Statistical Society B 1970, 32:283-301.

Andersen. The Asymptotic Distribution of Conditional Likelihood Ratio Tests. Journal ofthe American Statistical Association 1971, 66:630-3.

10


Solution of likelihood equations required recursive formulas for

the γ polynomials and for ratios

γt−x(δ(p))

γr(δ)

where δ(p) = (δi)i6=p. Procedure also yields the asymptotic co-

variance matrix. Implemented on the IBM 7094 at the Northern

Europe University Computing Center.

Andersen. The Numerical Solution of a Set of Conditional Estimation Equations. Journalof the Royal Statistical Society B 1972, 34:42-54.

11


Doctoral thesis 1973

Conditional inference as the statistical counterpart of philosoph-ical viewpoint of objective measurement.

"we must know how to relate a given model to the realworld, before alternative statistical models can berealistically discussed"

Technical rather than philosophical issues.

Rasch as official opponent took the chair addressing for a very,

very long time the interpretation of the title - should it be ”for

measuring” or ”for measurements”

Andersen. Cond. Inf. and Models for Measuring. Copenhagen: Mentalhygiejnisk Forl. 1973

13


Requirements imposed on the data by the Rasch model

(i) θ unidimensional

(ii) Pr(X1 = x1, . . . , XI |θ) =∏I

i=1 Pr(Xi = xi|θ)(iii) Pr(Xi = xi|θ, Y ) = Pr(Xi = xi|θ, Y ), for any covariate Y .

(iv) equal discrimination

(iv’)∑

i Xi sufficient for θ.

14


Rasch model implies sufficiency. The reverse is also true.

Theorem. Consider the model

f(x1, . . . , xI |θ, βi) =∏i

f(xi|θ, βi) =∏i

P (Xi = xi|θ, βi) (∗)

If T = T (X1, . . . , XI) is a minimal sufficient statistic for any value

of (β1, . . . , βI) then (*) is equivalent to the Rasch model.

Lemma. If T = T (X1, . . . , XI) is a minimal sufficient statistic for

any value of (β1, . . . , βI) then T is invariant under permutations

of X1, . . . , XI

Andersen. Conditional inference for multiple-choice questionnaires. British Journal of Math-ematical and Statistical Psychology 1973, 26:31-44.

15


Polytomous Rasch models

Algebraic generalization of Rasch, for xi = 0,1, . . . , m

Pr(Xi = xi) = c−1i exp

m∑h=1

θhx(h)i +

m∑h=1

βix(h)i

(4)

statistically treated by Andersen. Ordering properties of T .

Theorem. T = T (X1, . . . , XI) minimal suff. indep. of item para-

meters + local independence ⇒ Rasch model.

Later descriptions of polytomous Rasch model cited more often

Andersen. Sufficient statistics and latent trait models. Psychometrika 1977, 42:69-81

Andrich. A rating formul. for ordered resp. categories. Psychometrika 1978, 43:561-73.

Masters. A Rasch model for partial credit scoring. Psychometrika 1982, 47:149-74.

16


the Andersen test

Likelihood (2) works for a single score value t0 (an invariance

feature of the model). Andersen shows that the mathematics

work.

LC(δ) =( ∏

v∏

i δxvii

)/

( ∏t [γt(δ)]

nt)

where (nt) is the number of persons in each score group and

L(t0)C (δ) =

( ∏v

∏i δ

xvii

)/ [γt0(δ)]

nt0

for each t0.

Andersen. A goodness of fit test for the Rasch model. Psychometrika 1973, 38:123-40.

17


the Andersen test, II

Asympt. χ2 distributed test statistic

Z = −2 logQ = 2∑

t0 logL(t0)C (δ̂(t0))− 2 logLC(δ̂)

power considerations. Testing model against the Birnbaum model.

Graphical test.

Many, many other tests of the model followed ...

Andersen. A goodness of fit test for the Rasch model. Psychometrika 1973, 38:123-40.

Birnbaum. Some latent trait models. (In: Lord & Novick Statistical theories of mental test

scores. Reading: Addison-Wesley, 1968).

18


Glas used multinomial dist. solving problem with χ2 tests in large

contingency tables. Andersen test also overcomes this problem

Reason for lack of model studied in detail by considering residuals

r =xi|t − πi|t

s.e.(xi|t − πi|t)(5)

or

r =δ̂(t)i − δ̂i

s.e.(δ̂(t)i − δ̂i)(6)

Glas. The derivation of some tests for the Rasch model from the multinomial distribution.

Psychometrika 1988, 53:525-46.

Andersen. Residualanalysis in the polytomous Rasch model. Psychometrika 1995, 60:375-93.

19


Application of standard results yields complicated formula for the

variance V (xi|t − πi|t), but surprisingly

V (ε̂(t)i − ε̂i) = V (ε̂(t)i )− V (ε̂i) (7)

these variances being a by-product of the estimation procedure.

Andersen. Residualanalysis in the polytomous Rasch model. Psychometrika 1995, 60:375-93.

20

Rao. Linear statistical inference and its applications (2nd ed.). Wiley and Sons, NY, 1973.

21


In fact (7) is a general result about variances of MLE’s

Residual diagrams to evaluate if MLE’s from independent sam-

ples can be assumed to be equal apart from random errors.

Example. Residual diagrams for the equality of the variances in

a one-way ANOVA.

Andersen. Residual Diagrams Based on a Remarkably Simple Result concerning the Vari-ances of Maximum Likelihood Estimators. Journal of Educational and Behavioral Statistics2002, 27:19-30.

22


Combining Rasch model with latent structure analysis

Rewrite likelihood based on (1) using (3):∏v

∏i Pr(Xvi = xvi|θj) = Lc(δ)

∏v g(tv|θv)︸︷︷︸

(?)

(8)

Rasch paradigm is to use first part for item analysis

"It seems, however, that the logical next step was nevertaken by Rasch, and only occasionally by his many followers"

Base inference about θ on (?). How to estimate and test assump-

tions about the latent distribution, e.g. conjugate distributions.

Andersen & Madsen. Estimating the parameters of the latent population distribution.Psychometrika 1977, 42:357-74.

23


Combining Rasch model with latent structure analysis, II

After accepting Rasch model proceed to check whether popula-

tion distribution can be described by a simple population density

Ex. 2 (Andersen & Madsen, p.370): Famed Stouffer-Toby data

Score group Item 1 Item 2 Item 3 Item 4 nt

t=0 0 0 0 0 42t=1 1 6 6 23 36t=2 7 33 33 53 63t=3 17 49 46 53 55t=4 20 20 20 20 20

Rasch model fits - normal density is clearly rejected.

Andersen & Madsen. Estimating the parameters of the latent population distribution.Psychometrika 1977, 42:357-74.

Stouffer & Toby. Role conflict and personality. Am. Journal of Sociology 1951,56:395-406.

24


Combining Rasch model with latent structure analysis, III

Testing assumptions about the structure of the underlying using

the observed score distribution also has a multivariate extension

using the contingency table

score at time 1 × score at time 2

Latent regression models imposing a linear structure on θ

Andersen. Est. latent corr. between repeated testings. Psychometrika 1985, 50:3-16.

Andersen. Latent Regression Analysis. Research Report 106, Department of Statistics,University of Copenhagen, 1994.

Andersen. Latent Regression Analysis based on the Rating Scale Model. PsychologyScience 2004, 46:209-26.

25


The Rasch model

MATHEMATICS

ξδi1+ξδi


26


Georg Rasch

MATHEMATICS↘

ξδi1+ξδi

↘


27


Erling Andersen

MATHEMATICS

ξδi1+ξδi

↙↗


28


Erling B. Andersen

Developed a lot of theory for the Rasch model, but also did a

lot of applied work using the Rasch model. Mainly at the Danish

Research Institute for Mental Health.

Placed the Rasch model within mainstream statistics.

Used the Rasch model as a platform to derive general results.

29


Impact of Erling B. Andersens work

first proof that Rasch models are characterized by sufficiency

proof that conditional ML estimates are consistent

theory of conditional ML estimation

first to combine Rasch models and latent structure analysis.

1995 Rasch book:

references in 16 out of 21 chapters - 39 in total (disregarding

cases where authors refer to themselves Erling B. Andersen is

the person with the largest number of citations).

Fischer & Molenaar. Rasch models. Foundations, recent developments, and applications.New York: Springer-Verlag, 1995.

30


Philosophy vs. mathematical statistics

Idea of conditional inference came from Rasch. The theory was

developed by Erling B. Andersen

Erling the model as a statistical model and never subscribed to

the idea that it should be regarded as a special ’measurement’.

Never referred to the concept of specific objectivity in any of his

papers.

Practice of Rasch analysis today owes more to Erling B. Andersen

than it does to Rasch.

31

Download - The Rasch model as a statistical model: Erling B ...publicifsv.sund.ku.dk/~kach/ChristensenRasch2010.pdf · Probabilistic models for measurement Copenhagen 2010 The Rasch model as

Top Related