development of health measurement scaleslcwu.edu.pk/ocd/cfiles/professional...
TRANSCRIPT
![Page 1: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/1.jpg)
Development of health measurement scales
If you cannot express in numbers something that you are describing, you probably have little knowledge about it.
![Page 2: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/2.jpg)
![Page 3: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/3.jpg)
Dr. Priyamadhaba BeheraJunior Resident, AIIMS
RELIABILITY AND VALIDITY 15/03/2013
3
![Page 4: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/4.jpg)
Why you need to worry about reliability and Validity ?What happens with low reliability and validity ?
What is the relationship between reliability and validity ?Do you need validity always ? Or reliability always ? Or both ?
What is the minimum reliability that is needed for a scale ?
![Page 5: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/5.jpg)
![Page 6: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/6.jpg)
No matter how well theobjectives are written,or how clever the items,the quality and usefulnessof an examination ispredicated on Validityand Reliability
Validity & Reliability
![Page 7: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/7.jpg)
Validity Reliability
Validity & Reliability
We don’t say “an exam is valid and reliable”
We do say “the exam score is reliable andValid for a specified purpose”
KEY ELEMENT!
![Page 8: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/8.jpg)
![Page 9: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/9.jpg)
Reliability Vs. Validity
![Page 10: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/10.jpg)
Validity• Two steps to determine usefulness of a scale
– Reliability – necessary but not sufficient
– Validity – next step
• Validity – is the test measuring what it is meant to measure?
• Two important issues– The nature of the what is being measured
– Relationship of that variable to its purported cause
• Sr. creatinine is a measure of kidney func. because we know it is regulated by the kidneys
• But whether students who do volunteer work will become better doctors?
• Since our understanding of human behaviour is far from perfect, such predictions have to validated against actual performance
![Page 11: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/11.jpg)
Types of validity
• Three Cs (conventionally)– Content
– Criterion• Concurrent
• Predictive
• Construct –– Convergent, discriminant, trait etc.,
All types of validity are addressing the same issue of the degree of confidence we can place in the inferences we can draw from the scales
•Others (face validity)
![Page 12: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/12.jpg)
Differing perspectives
• Previously validity was seen as demonstrating the properties of the scale
• Current thinking - what inferences can be made about the people that have given rise to the scores on these scales?– Thus validation is a process of hypothesis testing (someone who scores
on test A, will do worse in test B, and will differ from people who do better in test C and D)
– Researchers are only limited by their imagination to devise experiments to test such hypothesis
![Page 13: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/13.jpg)
Validity & Reliability
![Page 14: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/14.jpg)
• Face validity– On the face of it the tool appears to be measuring what it is
supposed to measure– Subjective judgment by one/more experts, rarely by any
empirical means
• Content validity– Measures whether the tool includes all relevant domains or
not– Closely related to face validity– aka. ‘validity by assumption’ because an expert says so
• Certain situations where these may not be desired
![Page 15: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/15.jpg)
Content validity
• Example – cardiology exam;– Assume it contains all aspects of the circulatory
system (physiology, anatomy, pathology, pharmacology etc., etc.,)
– If a person scores high on this test, we can say ‘infer’ that he knows much about the subject (i.e., our inferences about the person will right across various situations)
– In contrast, if the exam did not contain anything about circulation, the inferences we make about a high scorer may be wrong most of the time and vice versa
![Page 16: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/16.jpg)
• Generally, a measure that includes a more representative sample of the target behaviour will have more content validity and hence lead to more accurate inferences
• Reliability places an upper limit on validity (the maximum validity is the square root of reliability coeff.) the higher the reliability the higher the maximum possible validity
– One exception is that between internal consistency and validity (better to sacrifice IC to content validity)
– The ultimate aim of scale is inferential which depends more on content validity than internal consistency
![Page 17: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/17.jpg)
Criterion validity• Correlation of a scale to an accepted ‘gold standard’
• Two types
– Concurrent (both the new scale and standard scale are given at the same time)
– Predictive – the Gold Standard results will be available some time in the future (eg. Entrance test for college admission to assess if a person will graduate or not)
• Why develop a new scale when we already have a criterion scale?
– Diagnostic utility/substitutability(expensive, invasive, dangerous, time-consuming)
– Predictive utility (no decision can be made on the basis of new scale)
• Criterion contamination
– If the result of the GS is in part determined in some way by the results of the new test, it may lead to an artificially high correlation
![Page 18: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/18.jpg)
Construct validity• Height, weight – readily observable
• Psychological - anxiety, pain, intelligence are abstract variables and can’t be directly observed
• For eg. Anxiety – we say that a person has anxiety if he has sweaty palms, tachycardia, pacing back and forth, difficulty in concentrating etc., (i.e., we have a hypothesize that these symptoms are the result of anxiety)
• Such proposed underlying factors are called hypothetical constructs/ constructs (eg. Anxiety, illness behaviour)
• Such constructs arise from larger theories/ clinical observations
• Most psychological instruments tap some aspect of construct
![Page 19: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/19.jpg)
Establishing construct validity
• IBS is a construct rather than a disease – it is a diagnosis of exclusion
• A large vocabulary, wide knowledge and problem solving skills – what is the underlying construct?
• Many clinical syndromes are constructs rather than actual entities (schizophrenia, SLE)
![Page 20: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/20.jpg)
• Initial scales for IBS – ruling out other organic diseases and some physical signs and symptoms – These scales were inadequate because they lead to
many missed and wrong diagnoses
– New scales developed incorporating demographical features and personality features
• Now how to assess the validity of this new scale– Based on theory high scorers on this scale should
have • Symptoms which will not clear with conventional therapy
• Lower prevalence of organic bowel disease on autopsy
![Page 21: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/21.jpg)
Differences form other types
1. Content and criterion can be established in one or two studies, but there is no single experiment that can prove a construct•Construct validation is an ongoing process, learning more about the construct, making new predictions and then testing them•Each supportive study strengthens the construct but one well designed negative study can question the entire construct2. We are assessing the theory as well as the measure at the same time
![Page 22: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/22.jpg)
IBS example
• We had predicted that IBS patients will not respond to conventional therapy
• Assume that we gave the test to a sample of patients with GI symptoms and treated them with conventional therapy
• If high scoring patients responded in the same proportion as low scorers then there are 3 possibilities– Our scale is good but theory wrong– Our theory is good but scale bad– Both scale and theory are bad
• We can identify the reason only from further studies
![Page 23: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/23.jpg)
• If an experimental design is used to test the construct, then in addition to the above possibilities our experiment may be flawed
• Ultimately, construct validity doesn’t differ conceptually from other types of validity– All validity is at its base some form of construct
validity… it is the basic meaning of validity – (Guion)
![Page 24: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/24.jpg)
Establishing construct validity
• Extreme groups
• Convergent and discriminant validity
• Multitrait-multimethod matrix
![Page 25: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/25.jpg)
Extreme groups
• Two groups – as decided by clinicians – One IBS and the other some other GI disease
– Equivocal diagnosis eliminated
• Two problems– That we are able to separate two extreme groups implies
that we already have a tool which meets our needs (however we can do bootstrapping)
– This is not sufficient, the real use of a scale is making much finer discriminations. But such studies can be a first step, if the scale fails this it will be probably useless in practical situations
![Page 26: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/26.jpg)
Multitrait-multimethod matrix• Two unrelated traits/constructs each measured by two different methods
• Eg. Two traits – anxiety, intelligence; two methods – a rater, exam
– Purple – reliabilities of the four instruments (sh be highest)
– Blue – homotrait heteromethod corr. (convergent validity)
– Yellow – heterotrait homomethod corr. (divergent validity)
– Red – heterotrait heteromethod corr. (sh be lowest)
• Very powerful method but very difficult to get such a combination
Anxiety Intelligence
Rater Exam Rater Exam
AnxietyRater 0.53
Exam 0.42 0.79
IntelligenceRater 0.18 0.17 0.58
Exam 0.15 0.23 0.49 0.88
![Page 27: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/27.jpg)
• Convergent validity - If there are two measures for the same construct, then they should correlate with each other but should not correlate too much. E.g. Index of anxiety and ANS awareness index
• Divergent validity – the measure should not correlate with a measure of a different construct, eg. Anxiety index and intelligence index
![Page 28: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/28.jpg)
Biases in validity assessment
• Restriction in range• May be in new scale (MAO level)
• May be in criterion (depression score)
• A third variable correlated to both (severity)
• Eg. A high correlation was found between MAO levels and depression score in community based study, but on replicating the study in hospital the correlation was low
![Page 29: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/29.jpg)
![Page 30: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/30.jpg)
Validity & Reliability
Content/Action + Error
The information we seek and our best hope for obtaining it.
Our human frailty and inability to write effective questions.
![Page 31: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/31.jpg)
![Page 32: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/32.jpg)
Maximum validity of a test is the square root of reliability coefficient. Reliability places an upper limit on validity so that higher the reliability, higher the maximum possible
validity
![Page 33: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/33.jpg)
Variance = sum of (individual value – mean value)2
----------------------------------------------------------------------------------
no. of values
![Page 34: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/34.jpg)
Reliability• Whether our tool is measuring the attribute in a
reproducible fashion or not
• A way to show the amount of error (random and systematic) in any measurement
• Sources of error – observers, instruments, instability of the attribute
• Day to day encounters– Weighing machine, watch, thermometer
![Page 35: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/35.jpg)
Assessing Reliability
• Internal Consistency– The average correlation among all the items in the tool• Item-total correlation• Split half reliability• Kuder-Richardson 20 & Cronbach’s alpha• Multifactor inventories
• Stability– Reproducibility of a measure on different occasions• Inter-Observer reliability• Test-Retest reliability (Intra-Observer reliability)
![Page 36: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/36.jpg)
Internal consistency• All items in a scale tap different aspects of the same
attribute and not different traits
• Items should be moderately corr. with each other and each item with the total
• Two schools of thought– If the aim is to describe a trait/behaviour/disorder
– If the aim is to discriminate people with the trait from those without
• The trend is towards scales that are more internally consistent
• IC doesn’t apply to multidimensional scales
![Page 37: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/37.jpg)
Item-total correlation• Oldest, still used
• Correlation of each item with the total score w/o that item
• For k number of items, we have to calculate k number of correlations, labourious
• Item should be discarded if r < 0.20(kline 1986)
• Best is Pearson’s R, in case of dichotomous items - point-biserial correlation
![Page 38: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/38.jpg)
Split half reliability • Divide the items into two halves and calculate corr.
between them
• Underestimates the true reliability because we are reducing the length of scale to half (r is directly related to the no. of items)– Corrected by Spearman-Brown formula
• Should not be used in– Chained items
Difficulties-ways to divide a test
-doesn't point which item is contributing to poor reliability
![Page 39: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/39.jpg)
KR 20/Cronbach’s alfa• KR-20 for dichotomous responses
• Cronbach’s alfa for more than two responses
• They give the average of all possible split half reliabilities of a scale
• If removing an item increases the coeff. it should be discarded
• Problems– Depends on the no. of items
– A scale with two different sub-scales will prob. yield high alfa
– Very high alfa denotes redundancy (asking the same question in slightly different ways)
– Thus alfa should be more than 0.70 but not more than 0.90
![Page 40: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/40.jpg)
• Cronbach’s basic equation for alpha
– n = number of questions– Vi = variance of scores on each question– Vtest = total variance of overall scores on the
entire test
Σ−
−=
Vtest
Vi
n
n1
1α
![Page 41: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/41.jpg)
Multifactor inventories• More sophisticated techniques
• Item-total procedure – each item should correlate with the total of its scale and the total of all the scales
• Factor analysis– Determining the underlying factors
– For eg., if there are five tests
• Vocabulary, fluency, phonetics, reasoning and arithmetic
• We can theorize that the first three would be correlated under a factor called ‘verbal factor’ and the last two under ‘logic factor’
![Page 42: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/42.jpg)
Stability/ Measuring error
• A weighing machine shows weight in the range of say 40-80 kg and thus an error of ±1kg is meaningful
Reality we calculate the ratiovariability between subjects / total variability(Total variability includes subjects and measurement error)
• So that a ratio of–1 indicates no measurement error/perfect reliability –0 indicates otherwise
![Page 43: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/43.jpg)
• Reliability =
subj. variability / (subj. variability + measurement error)
• Statistically ‘variance’ is the measure of variability so,
• Reliability =
SD2 of subjects / (SD2 of subjects + SD2 of error)
• Thus reliability is the proportion of the total variance that is due to the ‘true’ differences between the subjects
• Reliability has meaning only when applied to specific populations
![Page 44: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/44.jpg)
Calculation of reliability
• The statistical technique used is ANOVA and since we have repeated measurements in reliability, the method is – repeated measures ANOVA
![Page 45: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/45.jpg)
Example
![Page 46: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/46.jpg)
![Page 47: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/47.jpg)
• Classical definition of reliability• Interpretation is that 88% of the variance is
due to the true variance among patients (aka Intraclass Correlation coefficient)
![Page 48: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/48.jpg)
Fixed/random factor
• What happened to the variance due to observers?
• Are these the same observers going to be used or they are a random sample?
• Other situations where observations may be treated as fixed is subjects answering ‘same items on a scale’
![Page 49: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/49.jpg)
Other types of reliability
• We have only examined the effect of different observers on the same behaviour
• But there can be error due to ‘day to day’ differences, if we measure the same behaviour a week or two apart we can calculate ‘intra-observer reliability coefficient’
• If there are no observers (self-rated tests) we can still calculate ‘test-retest reliability’
![Page 50: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/50.jpg)
• Usually high inter-observer is sufficient, but if it is low then we may have to calculate intra-observer reliability to determine the source of unreliability
• Mostly measures of internal consistency are reported as ‘reliability’, because there are easily computed in a single sitting. – Hence caution is required as they may not measure
variability due to day to day differences
![Page 51: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/51.jpg)
Diff. forms of reliability coefficient
• So far we have seen forms of ICC
• Others – Pearson product-moment correlation
– Cohen’s kappa
– Bland – altman analysis
![Page 52: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/52.jpg)
Pearson’s correlation
• Based on regression – the extent to which the relation between two variables can be described by straight line
![Page 53: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/53.jpg)
Limitations of Pearson’s R
• A perfect fit of 1.0 may be obtained even if the intercept is non-zero and the slope is not equal to one unlike with ICC
• So, Pearson’s R will be higher than truth, but in practice it is usually equal to ICC as the predominant source of error is random variation
• If there are multiple observations then multiple pairwise Rs are required, unlike the single ICC
• For eg. with 10 observers there will be 45 Pearson’s Rs whereas only one ICC
![Page 54: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/54.jpg)
• Used when responses are dichotomous/categorical
• When the frequency of positive results is very low or high, kappa will be very high
• Weighted kappa focuses on disagreement, cells are weighted according to the distance from the diagonal of agreement
• Weighting can be arbitrary or using quadratic weights (based on square of the amount of discrepancy)
• Quadratic scheme of weighted kappa is equivalent to ICC
• Also, the unweighted kappa is equal to ICC based on ANOVA
Kappa coeff.
![Page 55: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/55.jpg)
![Page 56: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/56.jpg)
![Page 57: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/57.jpg)
Kappa coeff.
![Page 58: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/58.jpg)
Bland and Altman method
• A plot of difference between two observations against the mean of the two observations
![Page 59: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/59.jpg)
• Agreement is expressed as the ‘limits of agreement’. The presentation of the 95% limits of agreement is for visual judgement of how well two methods of measurement agree. The smaller the range between these two limits the better the agreement is.
• The question of how small is small depends on the clinical context: would a difference between measurement methods as extreme as that described by the 95% limits of agreement meaningfully affect the interpretation of the results
• Limitation - the onus is placed on the reader to juxtapose the calculated error against some implicit notion of true variability
![Page 60: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/60.jpg)
Standards for magnitude of reliability coeff.•How much reliability is good? Kelly (0.94) Stewart (0.85)•A test for individual judgment should be higher than that for research in groups•Research purposes – – Mean score and the sample size will reduce the error– Conclusions are usually made after a series of studies– Acceptable reliability is dependent on the sample size
in research(in sample of 1000 reliablity may low compared to sample size of 10)
![Page 61: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/61.jpg)
Reliability and probability of misclassification•Depends on the property of the instrument and the decision of cut point•Relation between reliability and likelihood of misclassification– Eg. A sample of 100, one person ranked 25th and another
50th
– If the R is 0, 50% chance that the two will reverse order on retesting
– If R is 0.5, 37% chance, with R=0.8, 2.2% chance
•Hence R of 0.75 is minimum requirement for a useful instrument
![Page 62: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/62.jpg)
Improving reliability• Increase the subject variance relative to the error
variance (by legitimate means and otherwise)
• Reducing error variance– Observer/rater training
– Removing consistently extreme observers
– Designing better scales
• Increasing true variance– In case of ‘floor’ or ‘ceiling’ effect, introduce items that
will bring the performance to the middle of the scale (thus increasing true variance)• Eg. Fair-good-very good-excellent
![Page 63: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/63.jpg)
• Ways that are not legitimate – Test the scale in a heterogeneous population
(normal and bedridden arthritics)
– A scale developed in homogeneous population will have a larger reliability when used in a heterogeneous population • correct for attenuation
![Page 64: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/64.jpg)
• Simplest way to increase R is to increase the no. of items(statistical theory)
• True variance increases as the square of items whereas error variance increases only as the no. of items
• If the length of the test is triples – Then Rspearman brown = 3R/ 1 + 2R
![Page 65: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/65.jpg)
• In reality the equation overestimates the new reliability
• We can also use this equation to determine the length of a test for achieving a pre-decided reliability
• To improve test-retest reliability – shorten the interval between the tests
• An ideal approach is the examine all the sources of variation and try to reduce the larger ones (generalizability theory)
![Page 66: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/66.jpg)
Summary for Reliability
• Pearson R is theoretically incorrect but in practice fairly close
• Bland and Altman method is analogous to error variance of ICC but doesn’t relate this to the range of observations
• kappa and ICC are identical and most appropriate
![Page 67: Development of health measurement scaleslcwu.edu.pk/ocd/cfiles/Professional Studies/FC/B.Ed-204...Development of health measurement scales If you cannot express in numbers something](https://reader035.vdocuments.site/reader035/viewer/2022071421/611a29377c953905e073aabf/html5/thumbnails/67.jpg)
71
THANK YOU
71