letter from mr. patrick marquis

2
Volume 4 Number 4 2001 VALUE IN HEALTH © ISPOR 1098-3015/01/$15.00/344 344–345 344 LETTERS TO THE EDITOR To the Editor—Health-related quality of life (HRQL), health status, and other specific functional status assessments that are included udner the umbrella of PRO (patient-reported outcomes), are increas- ingly used as efficacy end points in randomized controlled trials. It is now recognized that, although perceptual, PROs can be measured in reliable and valid ways. Indeed, evidence of the scientific sound- ness of the questionnaire should be provided. In that sense we fully support Paul Kind’s general statement “we need demonstrable rigor in our meth- ods.” Nevertheless, Dr. Kind’s comments raise two major issues: the perspective taken for scaling, and the level of data reported in a single manuscript. Scaling It is common practice for multi-item scales in de- scriptive (psychometric) questionnaires to be scored using the method of summated ratings. Indeed, simple summing of scores over the individual items is the most rational index. This “linear model” ap- proach works if items are measuring the same construct, the scaling assumption being based on a similar distribution of responses to items and simi- lar item variances. In addition, the internal consis- tency reliability of the scale is estimated using Cron- bach’s alpha coefficient. This provides an indication of the degree of convergence between different items hypothesized to represent the same construct. Clas- sic references include Likert [1], Nunnally [2], and Streiner [3]. This was the perspective we took for the MSF-4, bearing in mind that the MSF-4 was a descriptive questionnaire aimed at evaluating the sexual func- tional status of men with benign prostatic hyper- trophy (BPH). We actually followed the psycho- metric criteria described and recommended by the Medical Outcomes Trust and its Scientific Advi- sory Committee [4], based on Likert’s [1] theory. We did not introduce a valuation system (ex- plicit weights) in the scoring algorithm of the MSF-4, given that this questionnaire is not a pref- erence-based instrument and the introduction of differential weights in a one-domain, multi-item scale does not seem to provide a substantial ad- vantage over using the unweighted score, particu- larly when item-total correlations are similar or when the reliability is acceptable [5,6]. Furthermore, improvement in the quality of the items and/or in- creases in the number of items are generally rec- ommended ways of improving reliability, rather than the weighting of items. In addition, major is- sues related to weighting are still under discussion: Which method should be used? Whose value should be taken into account? Level of data displayed in a single manuscript If the first issue raised by Dr. Kind can be consid- ered as theoretical, or even philosophical, the sec- ond one is very practical. According to the perspec- tive taken, the underlying theory, and the context, authors have to face a difficult choice. What is the minimum level of data that should be reported in a single manuscript, taking into account the type of journal and the numbers of words/tables recom- mended by the editor? How much evidence should be provided to demonstrate the appropriateness of the scoring system and the reliability and validity of the PRO instrument? One can easily note that, even though standards of validation are available, great variability exists in the types of data reported in manuscripts describing the development and use of PRO instruments. In particular, details support- ing the scoring algorithm or the ordinality of item response categories are not commonly reported. Following the usual practice, in our manuscript we decided to put the focus on the clinical validity of the MSF-4 questionnaire rather than report de- tails on the scaling assumptions. A great deal more information is available in the analyses than was reported in the manuscript. Interested readers can contact the author for additional details on the MSF-4 instrument characteristics. Again, we think the main issue is the absence of consensus regarding the type of data that should be shown in a manuscript to support the valida- tion of a scale. In any case, as stated by Dr. Kind, we should go beyond Cronbach’s alpha.—Patrick Marquis, Mapi Values, Lyon, France. References 1 Likert R. A technique for the measurement of atti- tudes. Arch Psychol 1932;140:5–55.

Upload: patrick-marquis

Post on 06-Jul-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Letter From Mr. Patrick Marquis

Volume 4 • Number 4 • 2001VALUE IN HEALTH

© ISPOR 1098-3015/01/$15.00/344 344–345

344

LETTERS TO THE EDITOR

To the Editor—Health-related quality of life (HRQL),health status, and other specific functional statusassessments that are included udner the umbrellaof PRO (patient-reported outcomes), are increas-ingly used as efficacy end points in randomizedcontrolled trials. It is now recognized that, althoughperceptual, PROs can be measured in reliable andvalid ways. Indeed, evidence of the scientific sound-ness of the questionnaire should be provided. Inthat sense we fully support Paul Kind’s generalstatement “we need demonstrable rigor in our meth-ods.” Nevertheless, Dr. Kind’s comments raise twomajor issues: the perspective taken for scaling, andthe level of data reported in a single manuscript.

Scaling

It is common practice for multi-item scales in de-scriptive (psychometric) questionnaires to be scoredusing the method of summated ratings. Indeed,simple summing of scores over the individual itemsis the most rational index. This “linear model” ap-proach works if items are measuring the sameconstruct, the scaling assumption being based on asimilar distribution of responses to items and simi-lar item variances. In addition, the internal consis-tency reliability of the scale is estimated using Cron-bach’s alpha coefficient. This provides an indicationof the degree of convergence between different itemshypothesized to represent the same construct. Clas-sic references include Likert [1], Nunnally [2], andStreiner [3].

This was the perspective we took for the MSF-4,bearing in mind that the MSF-4 was a descriptivequestionnaire aimed at evaluating the sexual func-tional status of men with benign prostatic hyper-trophy (BPH). We actually followed the psycho-metric criteria described and recommended by theMedical Outcomes Trust and its Scientific Advi-sory Committee [4], based on Likert’s [1] theory.

We did not introduce a valuation system (ex-plicit weights) in the scoring algorithm of theMSF-4, given that this questionnaire is not a pref-erence-based instrument and the introduction ofdifferential weights in a one-domain, multi-itemscale does not seem to provide a substantial ad-vantage over using the unweighted score, particu-larly when item-total correlations are similar or

when the reliability is acceptable [5,6]. Furthermore,improvement in the quality of the items and/or in-creases in the number of items are generally rec-ommended ways of improving reliability, ratherthan the weighting of items. In addition, major is-sues related to weighting are still under discussion:Which method should be used? Whose value shouldbe taken into account?

Level of data displayed in a single manuscript

If the first issue raised by Dr. Kind can be consid-ered as theoretical, or even philosophical, the sec-ond one is very practical. According to the perspec-tive taken, the underlying theory, and the context,authors have to face a difficult choice. What is theminimum level of data that should be reported in asingle manuscript, taking into account the type ofjournal and the numbers of words/tables recom-mended by the editor? How much evidence shouldbe provided to demonstrate the appropriateness ofthe scoring system and the reliability and validityof the PRO instrument? One can easily note that,even though standards of validation are available,great variability exists in the types of data reportedin manuscripts describing the development and useof PRO instruments. In particular, details support-ing the scoring algorithm or the ordinality of itemresponse categories are not commonly reported.

Following the usual practice, in our manuscriptwe decided to put the focus on the clinical validityof the MSF-4 questionnaire rather than report de-tails on the scaling assumptions. A great deal moreinformation is available in the analyses than wasreported in the manuscript. Interested readers cancontact the author for additional details on theMSF-4 instrument characteristics.

Again, we think the main issue is the absence ofconsensus regarding the type of data that shouldbe shown in a manuscript to support the valida-tion of a scale. In any case, as stated by Dr. Kind,we should go beyond Cronbach’s alpha.—PatrickMarquis, Mapi Values, Lyon, France.

References

1 Likert R. A technique for the measurement of atti-tudes. Arch Psychol 1932;140:5–55.

Page 2: Letter From Mr. Patrick Marquis

Letters to the Editor

345

2 Nunnally JC. Psychometric Theory (2nd ed.). NewYork: McGraw-Hill, 1978.

3 Streiner DL, Norman GR. Health Measurementscales. A Practical Guide to Their Developmentand Use. New York: Oxford University Press,1989.

4 Lohr KN, Aaronson NK, Alonso J, et al. EvaluatingQuality of Life and Health Status Instruments: de-

velopment of scientific review criteria. Clin Thera-peutics 1996;18:979–92.

5 Lei H, Skinner HA. A psychometric study of lifeevents and social readjustment. J Psychosomatic Re-search 1980;24:57–65.

6 Edwards AL, Kenney KC. A comparison of theThurstone and Likert Kechniques of attitude scaleconstruction. J Appl Psychology 1946;30:72–83.