a unified approach for assessing agreement lawrence lin, baxter healthcare a. s. hedayat, university...
Post on 27-Mar-2015
221 Views
Preview:
TRANSCRIPT
A Unified Approach for Assessing Agreement
Lawrence Lin, Baxter Healthcare A. S. Hedayat, University of Illinois at
Chicago Wenting Wu, Mayo Clinic
Outline
IntroductionExisting approachesA unified approachSimulation studiesExamples
Introduction Different situations for agreement
Two raters, each with single readingMore than two raters, each with single readingMore than two raters, each with multiple readings• Agreement within a rater• Agreement among raters based on means• Agreement among raters based on individual
readings
Existing Approaches (1)
Agreement between two raters, each with single reading
Categorical data: • Kappa and weighted kappa
Continuous data: • Concordance Correlation Coefficient (CCC)• Intraclass Correlation Coefficient (ICC)
Existing Approaches (2)
Agreement among more than two raters, each with single reading
Lin (1989): no inferenceBarnhart, Haber and Song (2001, 2002): GEEKing and Chinchilli (2001, 2001): U-statisticsCarrasco and Jover (2003): variance components
Existing Approaches (3)
Agreement among more than two raters, each with multiple readings
Barnhart (2005)• Intra-rater/ inter-rater (based on
means) /total (based on individual observations) agreement
• GEE method to model the first and second moments
Unified Approach
Agreement among k (k≥2) raters, with each rater measures each of the n subjects multiple (m) times.Separate intra-rater agreement and inter-rater agreementMeasure relative agreement, precision, accuracy, and absolute agreement, Total Deviation Index (TDI) and Coverage Probability (CP)
Unified Approach - summary
Using GEE method to estimate all agreement indices and their inferencesAll agreement indices are expressed as functions of variance componentsData: continuous/binary/ordinaryMost current popular methods become special cases of this approach
Unified Approach - model
Set up
subject effect subject by rater effect error effect
rater effect
ijlijjiijl ey ),0(: 2
i
),0(: 2eijle
k
jj
1
0
),0(: 2ij
ni ,...,2,1
kj ,...,2,1
ml ,...2,1
)1()(1
1 1'
2'
2
kkk
j
k
jjjj
mj ,...,2,1
Unified Approach - targets
Intra-rater agreement: overall, are k raters consistent with themselves?
Inter-rater agreement: Inter-rater agreement (agreement based on mean): overall, are k raters agree with each other based on the average of m readings?Total agreement (agreement based on individual reading): overall, are k raters agree with each other based on individual of the m readings?
Unified Approach – agreement(intra)
: for over all k raters, how well is each rater in reproducing his readings?
]...,|)1()([
])1()([1
,,2,11
2.
1
2.
,
tindependenyyymyyE
myyE
ijmijij
m
lijijl
m
lijijl
Intrac
222
22
e
Intrac,
Unified Approach – precision(intra) and MSD
: for any rater j, the proportion of the variance that is attributable to the subjects (same as )Examine the absolute agreement independent of the total data range:
Intra
Intrac,
22'
2 2)( eijlijl yyEMSD
Unified Approach – TDI(intra) : for each rater j, % of observations are within unit of their replicated readings from the same rater.
is the cumulative normal distribution is the absolute value
||)2
11(1
)(
Intra
21 2)2
11( e
)( Intra )100*()( Intra
|.|
Unified Approach – CP(intra)
: for each rater j, of observations are within unit of their replicated readings from the same rater
)( Intra %100*)( Intra
)]2/(1[21 2)( eIntra
Unified Approach – agreement(inter)
: for over all k raters, how well are raters in reproducing each others based on the average of the multiple readings?
],,...,,|)1()([
)]1()([
1
..2.11
2...
1
2...
,
tindependenyyykyyE
kyyE
ikii
k
jiij
k
jiij
Interc
2222
2
/
me
Interc,
Unified Approach – precision(inter) : for any two raters, the proportion of the variance that is attributable to the subjects based on the average of the m readings
)var()var(
),cov(
'..
'..
ijij
ijijInter
yy
yy
me /222
2
Inter
Unified Approach – accuracy(inter)
: how close are the means of different raters:
2222
222
, /
/
m
m
e
eIntera
InteraInterInterc ,,
Intera,
Unified Approach – TDI(inter)
: for overall k raters, % of the average readings are within unit of the replicated averaged readings from the other rater.
)100*(
||)2
11(1
)(
Inter
me /222)2
11( 2221
)( Inter
)( Inter
Unified Approach – CP(inter)
: for each rater j, of averaged readings are within unit of replicated averaged readings from the other rater
)( Inter %100*)( Inter
)]/222/(1[21 222)( meIntra
Unified Approach – agreement(total)
: for over all k raters, how well are raters in reproducing each others based on the individual readings?
],,...,,|)1()([
])1()([
1
211
2.
1
2.
,
tindependenyyykyyE
kyyE
ikllili
k
jliijl
k
jliijl
totalc
2222
2
e
Totalc,
Unified Approach – precision(total)
: for any two raters, the proportion of the variance that is attributable to the subjects based on the individual readings
)var()var(
),cov(
''
''
lijijl
lijijltotal
yy
yy
222
2
e
total
Unified Approach – accuracy(total)
: how close are the means of different raters (accuracy)
2222
222
,
e
etotala
totala,
totalatotalTotalc ,,
Unified Approach – TDI(total)
: for overall k raters, % of the readings are within unit of the replicated readings from the other rater.
)(Total)100*(
||)2
11(1
)(
Total
2221 222)2
11( e
)(Total
Unified Approach – CP(total)
: for each rater j, of readings are within unit of replicated readings from the other rater
)(Total %100*)(Total
)]222/(1[21 222)( eTotal
Unified Approach
is the inverse cumulative normal distribution is a central Chi-squre distribution with df=1
)2
11(1 Q
Statistics INTRA INTER TOTAL M=1
Agreement
Precision
Accuracy NA
MSD
TDIπ
CPδ
222
22
e
me /2222
2
2222
2
e
222
2
e
222
22
e
me /222
2
222
2
e
22
2
e
m
m
e
e
/
/2222
222
2222
222
e
e
222
22
e
e
22 e me /222 222 222 222 e 22 22 e
IntraMSDQ InterMSDQ TotalMSDQ MSDQ
)1,(2
2
IntraMSD
)1,(2
2
InterMSD
)1,(2
2
TotalMSD
)1,(2
2
MSD
)1,(2
2
MSD
Estimation and Inference
Estimate all means, variance components,
and their variances and covariances by GEE methodEstimate all indices using above estimatesEstimate variances of all indices using above estimates and delta method
222221 ,,,,,,,, ek
Estimation and Inference (2)
: the covariance of two replications,
and ,with coming from rater and
coming from rater
'j
''lljjjl
)1(
2
2
1
1 1' 1 1'''
2
kkm
k
j
k
jj
m
l
m
llljj
'l l'l
Estimation and Inference (3)
: the variance from each combination
of (i, j), i.e., each cell. Thus is the average of all cells’ variances.
nk
n
i
k
jij
e
1 1
2
2
2ij
2e
Estimation and Inference (4)
: the variance of replication of rater : the covariance of two replications, and
, both of them coming from rater .
kmA
k
j
m
ljl
2
1 1
2
kmB
k
j
m
l
m
lljll
2
1
1
1 1'
'2
mBA e /222
2jl
'jll
l j
jl 'l
Estimation and Inference (5)
Using GEE method to estimate all indices through estimating the means and all variance components: 2222
21 ,,,,,...,, ek
n
iiiii YYHF
1
1' 0)(
Estimation and Inference (6)
E
D
C
B
A
A
A
YY k
j
i
..
..1
222
2
2
222
1
/
/
.
.
)(
m
m
YYE
e
e
e
k
j
ii
Estimation and Inference (7)myyyA ijmijijj /)..( 21
)1(/)(1
1 1'
2'..
kkyyBk
j
k
jjijij
)1(/]))([(2 21
1 1' 1 1'''.''.
kkmyyyyCk
j
k
jj
m
l
m
lljlijjlijl
k
j
m
lijijl kmyyD
1 1
2. /])1/()([
kmyyyykmyyEk
j
m
l
m
lljlijljlijl
k
j
m
ljlijl
2
1
1
1 1''.'.
2
1 1
2. /]))([(2/)(
Estimation and Inference (8)
is the working variance-covariance structure of , “working” means assume following normal distribution is the derivative matrix of expectation of with respective to all the parameters
iH
iYYiYY
),,,,,...,(/ 22221 ekiiF
iF
iYY
Estimation and Inference (9)GEE method provides:
estimates of all meansestimates of all variance componentsestimates of variances for all variance componentsEstimates of covariances between any two variance components
Estimation and Inference (10)
Delta method is used to estimate the variances for all indices
2222
22,
22222,
, )(
)var()],cov(2)var()[var()1()var(
e
eIntracIntracIntrac
2222
2222,,
)(
)],cov(),[cov()1(2
e
eeIntracIntrac
Estimation and Inference (12)
2222
22222,
22,
, )(
)var(/)var()[var()()var()1()var(
e
eIntercIntercInterc
m
22222
222222
)/(
]/),cov(2/),cov(2),cov(2
m
mm
e
ee
22222
222222,,
)/(
]/),cov(),cov(),[cov()1(2
m
m
e
eIntercInterc
Estimation and Inference (13)
2222
2222,
22,
, )(
)var()var()[var()()var()1()var(
e
eTotalcTotalcTotalc
22222
222222
)(
)],cov(2),cov(2),cov(2
e
ee
22222
222222,,
)(
)],cov(),cov(),[cov()1(2
e
eTotalcTotalc
Estimation and Inference (14)
2222
22222222
)/(
]/),cov(2/)var()[var()()var()1()var(
m
mm
e
eeInterInterInter
2222
2222
)/(
]/),cov(),[cov()1(2
m
m
e
eInterInter
2222
2222222
)(
)],cov(2)var()[var()()var()1()var(
e
eeTotalTotalTotal
2222
2222
)(
)],cov(),[cov()1(2
e
eTotalTotal
Estimation and Inference (15)
22222
2222222,
, )/(
/),cov(2/)var()var()[var()1()var(
m
mm
e
eeInteraIntera
22222
222222,,
)/(
)],cov(/),cov(),[cov()1(2
m
m
e
eInteraIntera
22222
22,
2222
)/(
)var(]/),cov(2),cov(2
m
m
e
Interae
Estimation and Inference (16)
22222
222222,
, )(
),cov(2)var()var()[var()1()var(
e
eeTotalaTotala
22222
22,
2222
)(
)var()],cov(2),cov(2
e
Totalae
22222
222222,,
)(
)],cov(),cov(),[cov()1(2
e
eTotalaTotala
Estimation and Inference (17)
)var(4)var( 22eIntra
]),cov(),cov(
),cov()var(
)var()[var(4)var(2222
222
2222
mmmeee
Inter
)],cov(),cov(),cov()var()var()[var(4)var( 2222222222eeeTotal
Estimation and Inference (18)Transformations for variances
Z-transformation: CCC-indices and precision indices
Logit-transformation: accuracy and CP indices
Log-transformation: TDI indices
)1
1ln(
2
1
c
cz
)1
ln(a
a
)ln( 2e
Simulation Studythree types of data: binary/ordinary/normalthree cases for each type of data
k=2, m=1 / k=4, m=1 / k=2, m=3
for each case: 1000 random samples with sample size n=20for binary and ordinary data: inferences obtained through transformation vs. no-transformationFor normal data: transformation
Simulation Study (2)Conclusions:
Algorithm works well for three types of data, both in estimates and in inferencesFor binary and ordinary data: no need for transformationFor normal data, Carrasco’s method is superior than us, but for categorical data, our is superior. For ordinal data, both Carrasco’s method and ours are similar.
Example One
Sigma method vs. HemoCue method in measuring the DCHLb level in patients’ serum299 samples: each sample collected twice by each method Range: 50-2000 mg/dL
Example One – HemoCue method
HemoCue method first readings vs. second readings
Example One – Sigma method
Sigma method first readings vs. second readings
Example One – HemoCue vs. Sigma
HemoCue’s averages vs. Sigma’s averages
Example One – analysis result (1)
Statistics Estimates 95% CI* Allowance
ccc_inter 0.9866 0.9818 0.9775
ccc_total 0.9859 0.9809
precision_intra 0.9986 0.9982 0.9943
precision_inter 0.9866 0.9818
precision_total 0.9860 0.9809
accuracy_inter 0.9999 0.9974
accuracy_total 0.9999 0.9974
Example One – analysis result (2)
*: for all CCC, precision, accuracy and CP indices, the 95% lower limits are
reported. For all TDI indices, the 95% upper limit are reported.
Statistics Estimates 95% CI* Allowance
TDIintra(0.9) 41.0903 47.2713 75
TDIinter(0.9) 127.273 149.799 150
TDItotal(0.9) 130.548 152.678
CPintra(75) 0.9973 0.9942 0.9
CPinter(150) 0.9475 0.9170 0.9
CPintra(150) 0.9412 0.9102
Example Two
Hemagglutinin Inhibition (HAI) assay for antibody to Influenza A (H3N2) in rabbit serum samples from two labs64 rabbit serum samples: measured twice by each labAntibody level: negative/positive/highly positive
Example Two – Lab one
Second Reading
First Reading
Negative
Positive Highly positive
Negative
6 1 0
Positive 0 49 0
Highly positive
0 0 8
Example Two – Lab two
Second Reading
First Reading
Negative Positive Highly positive
Negative 2 0 0
Positive 0 22 2
Highly positive
0 5 33
Example Two: Lab one vs. lab two
Lab Two First Reading
Lab OneFirst Reading
Negative Positive Highly positiv
e
Negative 2 5 0
Positive 0 19 30
Highly positive
0 0 8
Example Two: lab one vs. lab two
Lab Two Second Reading
Lab OneSecond Reading
Negative Positive Highly positive
Negative 2 4 0
Positive 0 23 27
Highly positive
0 0 8
Example TwoStatistics Estimates 95% CI* Allowance
ccc_inter 0.37225 0.22039 0.4375
ccc_total 0.35776 0.20970
precision_intra
0.88361 0.79692 0.75
precision_inter
0.56795 0.4359
precision_total
0.53489 0.39999
accuracy_inter
0.65543 0.51586
accuracy_total
0.66885 0.53561
Conclusions (1)
When data are continuous and m goes to ∞:
agreement indices are the same as that proposed by Barnhart (2005), both in estimates and inferencesimprovements• Precision indices, accuracy indices TDIs
and CP• Variance components
Conclusions (2)When m=1:
agreement index degenerates into OCCC as proposed by King (2002), Carrasco (2003) for continuous data Improvements:
• For categorical data:– King’s method: approximates to kappa and weighted
kappa, our estimates (without transformation) are exactly the same as kappa and weighted kappa, both in estimate and in inference.
– Our estimates superior to Carrasco’s estimates when precision and accuracy are high
• Covariates adjustment become available
Conclusions (3)
When data are continuous, k=2 and m=1:
agreement index degenerates to the original CCC by Lin (1989)
When data are binary, k=2 and m=1:
agreement index degenerates into kappa, both in estimate and inference
Conclusions (4)When data are ordinary, k=2 and m=1:
agreement index degenerates into weighted kappa with below weight set, both in estimate and in inference.
kjik
jiwij ,...,2,1,,
)1(
)(1
2
2
Conclusions (5)
Unified approach Relative agreement indices: CCC with precision and accuracy – data rangeAbsolute agreement: Total deviation indices and Coverage Probability – normal assumptionLink function need more workRequire balanced data
ReferencesBarkto, John J (1966): The intraclass correlation coefficient as a measure of reliability. Pshchological Reports 19, 3-11.Barnhart, H. X. and Williamson, J. M. (2001). Modeling concordance correlation via GEE to evaluate reproducibility. Biometrics 57, 931-940.Barnhart, H. X. Song, Jingli and Haber, Michael J. (2005): Assessing intra, inter and total agreement with replicated readings. Statistics in Medicine 19: 255-270.Carrasco, J. L. and Jover, L. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59, 849-858.Fleiss, J., Cohen, J. and Everitt, B (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72, 323-327.King, Tonya S. and Chinchilli, Vernon M. (2001): A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20: 2131-2147.Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255-268.Lin, L. I., Hedayat, A. S., Sinha, B., and Yang, M. (2002). Statistical methods in assessing agreement: models, issues & tools. Journal of American Statistical Association 97(457), 257-270.Wu, Wenting. A unified approach for assessing agreement. Ph.D. thesis, UIC, 2006
top related