neural test theory: a nonparametric test theory using the mechanism of a self-organizing map shojima...
TRANSCRIPT
Neural Test Theory:A nonparametric test theory using the mechanism of a self-organizing map
SHOJIMA KojiroThe National Center for
University Entrance Examinations, [email protected]
1
Neural Test Theory (NTT)
• Shojima (2008) IMPS2007 CV, in press.– Test theory using the mechanism of a self-organizing
map (SOM; Kohonen, 1995)
• Scaling– Latent scale is ordinal.– Latent rank– Number of latent ranks is about [3, 20]– Item Reference Profile– Test Reference Profile– Rank Membership Profile
• Equating– Concurrent calibration
2
Why an Ordinal Scale?
Two main reasons:– Methodological– Sociological
3
Methodological Reason
• Psychological variables are continuous– Reasoning, reading comprehension, ability…– Anxiety, depression, inferiority complex…
• Tools do not have high resolution for measuring them on a continuous scale– Tests– Psychological questionnaires– Social investigation
4
Weight and Weighing Machine
• Phenomenon (continuous) • Measure (high reliability)
Weight
1 23 4
5
Ability and Test
• Phenomenon (continuous?)• Measure (low reliability)
Ability6
1234
Resolution• Power to detect difference(s) • Weighing machines
– can detect the difference between two persons of almost the same weight.
– can almost correctly array people according to their weights on the kilogram scale.
• Tests– cannot discriminate the difference between two
persons of nearly equal ability.– cannot correctly array people according to their
abilities.
• The most that tests can do is to grade examinees into several ranks. 7
Sociological Reason
• Negative aspects of continuous scale– Students are motivated to get the
highest possible scores.– They should not be pushed back and
forth by unstable continuous scores. • Positive aspects of ordinal scale
– Ordinal evaluation is more robust than continuous scores.
– Sustained endeavor is necessary to go up to the next rank.
8
NTT
Latent Rank TheorySOM GTM
Binary Shojima (in press) RN07-12
Polytomous(ordinal) RN07-03 In preparation
Polytomous(nominal) RN07-21 In preparation
Continuous In preparation In preparation
• ML (RN07-04)• Fitness (RN07-05)• Missing (RN07-06)
• Equating (RN07-9)• Bayes (RN07-15)
9
Statistical Learning of the NTT
・ For (t=1; t ≤ T; t = t + 1) ・ U(t)←Randomly sort row vectors of U
・ For (h=1; h ≤ N; h = h + 1) ・ Obtain zh
(t) from uh(t)
・ Select winner rank for uh(t)
・ Obtain V(t,h) by updating V(t,h−1)
・ V(t,N)←V(t+1,0)
Point 1
Point 2
10
Mechanism of Neural Test Theory
0
0
0
1
0
0
0
1
0
0
0
1
0
1
1
1
1
0
1
0
1
0
0
1
Latent rank scale
Nu
mb
er
of
item
s
ResponsePoint 1Point 2 Point 1Point 2
11
Point 1: Winner Rank Selection
The least squares method is also available.
Bayes
ML
)1,()()1,()(
1
)()1,()( 1ln1ln)|(
htqj
thj
htqj
thj
n
j
thj
htth vuvuzp Vu
Likelihood
)|(lnmaxarg: )1,()()(
htt
hQq
MLw pwR Vu
)(ln)|(lnmaxarg: )1,()()(q
htth
MAPw fppwR
Vu
12
Point 2: Reference Matrix Update
• The nodes of the ranks nearer to the winner are updated to become closer to the input data
• h: tension• α: size of tension• σ: region size of
learning propagation
)1,(')(')()()1,(),( )()'( htQ
thQ
th
tn
htht V1u1zh1VV
1
)1()(1
)1()(
2
)(exp
)1(}{
1
1
22
2)(
)()(
T
ttTT
ttT
Q
wq
N
Qh
nh
Tt
Tt
t
ttqw
tqw
t
h
13
Analysis Example
• Geography test
N 5000n 35Median 17Max 35Min 2Range 33Mean 16.911Sd 4.976Skew 0.313Kurt -0.074Alpha 0.704
0 5 10 15 20 25 30 35SCORE
0
100
200
300
400
500
YCNEUQERF
14
IRP of Item 25
IRP of Item 14
Item Reference Profile(IRP)
15
IRPs of Items 1–15 (ML, Q=10)
The monotonic increasing constraint can be imposed on the IRPs in the learning process.16
IRP of Items 16–35 (ML, Q=10)
17
IRP index (1) Item Difficulty
• Beta– Rank stepping over
0.5
• B– Its value
Kumagai (2007)
18
IRP index (2) Item Discriminancy
• Alpha– Smaller rank of the
neighboring pair with the biggest change
• A– Its value
19
IRP index (3) Item Monotonicity
• Gamma– Proportion of
neighboring pairs with negative changes.
• C– Their sum
20
ITEM R1 R2 R3・・・
R8 R9 R10 A α B β C γ
10.26
2 0.257
0.255
・・・
0.416 0.460 0.497 0.044 80.49
7 10 -0.007
0.222
20.27
1 0.255
0.240
・・・
0.319 0.320 0.317 0.025 50.31
7 10 -0.033
0.333
30.59
7 0.624
0.669
・・・
0.856 0.867 0.880 0.057 40.59
7 1 0.000
0.000
40.21
0 0.204
0.202
・・・
0.460 0.539 0.592 0.084 70.53
9 9 -0.009
0.222
50.22
7 0.219
0.214
・・・
0.319 0.390 0.445 0.071 80.44
5 10 -0.013
0.222
60.74
7 0.784
0.836
・・・
0.914 0.921 0.928 0.052 20.74
7 1 0.000
0.111
70.35
2 0.326
0.296
・・・
0.439 0.440 0.436 0.051 50.43
6 10 -0.066
0.444
80.22
9 0.234
0.238
・・・
0.490 0.593 0.667 0.104 80.59
3 9 0.000
0.000
90.44
4 0.491
0.562
・・・
0.778 0.802 0.816 0.071 20.56
2 3 0.000
0.000
100.28
7 0.254
0.210
・・・
0.548 0.648 0.719 0.112 60.54
8 8 -0.094
0.333
320.18
9 0.170
0.157
・・・
0.302 0.332 0.360 0.042 50.36
0 10 -0.032
0.222
330.16
8 0.188
0.221
・・・
0.333 0.376 0.414 0.044 80.41
4 10 0.000
0.000
340.40
7 0.413
0.424
・・・
0.566 0.585 0.593 0.036 60.53
5 7 0.000
0.000
350.48
1 0.522
0.569
・・・
0.719 0.765 0.794 0.051 70.52
2 2 0.000
0.000
Item Reference Profile Estimate
IRP indices
21
Can-Do Table (example)
IRP estimates IRP indicesAbility category and item content
22
Test Reference Profile (TRP)
• Weakly ordinal alignment condition– Satisfied when the TRP is monotonic, but not every IRP is
monotonic.• Strongly ordinal alignment condition
– Satisfied when all the IRPs are monotonic. TRP is monotonic.• The scale is not ordinal unless at least the weak condition is
satisfied.
• Weighted sum of the IRPs• Expected score of each
latent rank
23
Model-Fit Indices
ML, Q=10 ML, Q=5
• Fit indices are helpful in determining the number of latent ranks.
24
Bayes
ML
qjijqjij
n
jiji vuvuzp
1ln1ln)|(1
VuLikelihood
)|(lnmaxarg:)( VuiQq
MLi pwR
)(ln)|(lnmaxarg:)(qi
MAPi fppwR
Vu
Latent Rank Estimation
• Identical to the winner rank selection
25
Latent Rank Distribution (LRD)
• LRD is not always flat• Examinees are classified according
to the similarity of their response patterns. 26
Stratified Latent Rank Distribution
LRD stratified by sex LRD stratified by establishment
0.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4 5
Male Female Total
0.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4 5
National Public
Private Total
27
Relationship between Latent Ranks and Scores
• R-S scatter plot– Spearman’s R=0.929
• R-Q scatter plot– Spearman’s R=0.925
1 2 3 4 5 6 7 8 9 10LATENT RANK
0
5
10
15
20
25
30
35
EROCS1 2 3 4 5 6 7 8 9 10
LATENT RANK
1
2
3
4
5
6
7
8
9
10
ELITNAUQ
Validity of the NTT scale28
Rank Membership Profile (RMP)
• Posterior distribution of latent rank to which each examinee belongs
Q
q qqi
qqiiq
fpp
fppp
1' '' )()|(
)()|(
vu
vuRMP
29
RMPs of Examinees 1–15 (Q=10)
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 11
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 12
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 13
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 14
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 15
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 6
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 7
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 8
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 9
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 10
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 1
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 2
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 3
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 4
2 4 6 8 10LATENT RANK
0
0.2
0.4
0.6
0.8
1
YTILIBABORP
Examinee 5
30
Extended Models
• Graded Neural Test Model (RN07-03)– NTT model for ordinal polytomous data
• Nominal Neural Test Model (RN07-21)– NTT model for nominal polytomous data
• Batch-type NTT Model (RN08-03)• Continuous Neural Test Model• Multidimensional Neural Test Model
31
Graded Neural Test ModelBoundary Category Reference Profiles of Items
1–9Dashed lines are observation ratio profiles (ORP)
32
Graded Neural Test ModelBoundary Category Reference Profiles of Items
1–9Dashed lines are observation ratio profiles (ORP)
33
Nominal Neural Test ModelItem Category Reference Profiles of Items 1–16
* correct choice, x merged category of choices with selection ratios of less than 10%
34
Discussion
• Test standardization theory– Self-Organizing Map– Latent scale is ordinal– IRPs are flexible and nonlinear
• Test editing• CBT and CAT• Test equating
– Concurrent calibration
• Application– Japan’s National Achievement Test for 6th and 9th
graders 35
• Websitehttp://www.rd.dnc.ac.jp/~shojima/ntt/index.htm
• Software– Neutet
• Developed by Professor Hashimoto (NCUEE) • Available in Japanese and English versions
– EasyNTT• Developed by Professor Kumagai (Niigata Univ.) • Japanese version only 36