willingness to pay - mcgill university development of methods to measure willingness to pay (wtp)...

9

Click here to load reader

Upload: hathuan

Post on 26-Jun-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

289

Willingness to Pay:A Valid and Reliable Measure of HealthState Preference?

BERNIE O’BRIEN, PhD, JOSE LUIS VIRAMONTES, MD, MSc

The development of methods to measure willingness to pay (WTP) has renewed interest incost-benefit analysis (CBA) for the economic evaluation of health care programs. Theauthors studied the construct validity and test-retest reliability of WTP as a measure ofhealth state preferences in a survey of 102 persons (mean age 62 years; 54% male) whohad chronic lung disease (forced expiratory volume <70%). Interview measurements in-cluded self-reported symptoms, the oxygen-cost diagram for dyspnea, Short-Form 36 forgeneral health status, rating scale and standard gamble for value and utility of current healthstate relative to death and healthy lung functioning, and WTP for a hypothetical interventionoffering a 99% chance of healthy lung functioning and a 1% chance of death. WTP waselicited by a simple bidding game. To test for starting-point bias, the respondents wererandomly assigned to one of five starting bids. All health status and preference measurementsexcept WTP (controlling for income) showed significant (p < 0.05) differences betweendisease-severity groups (mild/moderate/severe). WTP was significantly (p = 0.01) associ-ated with household income, but other health status and preference measures were not.The measure most highly correlated with WTP was standard gamble (r = -0.46). Therewas no association between starting bid and mean WTP adjusted for income and healthstatus. The test-retest reliability of WTP was acceptable (r = 0.66) but lower than that forthe standard gamble (r = 0.82). It is concluded that: 1) large variation in WTP responsesmay compromise this measure’s discriminant validity; 2) there is some evidence of convergentvalidity for WTP with preferences measured by standard gamble; 3) there was no evidenceof starting point bias; 4) the test-retest reliability of WTP is comparable to those of otherpreference measures. Key words: willingness to pay; health state preferences; economics.(Med Decis Making 1994;14:289-297)

There are a variety of methods for the economic eval-uation of health care programs. The main distinguish-ing feature between methods is the way in which health

improvements are measured and valued.’ A methodthat is enjoying renewed interest is cost-benefit anal-ysis (CBA), in which health benefits are valued in mon-etary terms.2 CBA has theoretical appeal to the econ-omist because of its foundation in welfare economic

theory, in contrast to techniques such as cost-effec-tiveness analysis where the underlying theoretical baseis unclear.3 CBA data may also be appealing to decisionmakers because they quantify costs and benefits inmonetary units, thus permitting the net benefit (i.e.,

benefit minus cost) of a program to be calculated todetermine whether it is worth implementing.Enthusiasm for the theoretical advantages of CBA

must be tempered with an appreciation of the prac-tical difficulties inherent in placing money values onhealth outcomes. A convention in early CBA studiesof health care was the human-capital approach, wherethe values of population health gains or losses werecomputed in terms of production gains or losses, usu-ally proxied by discounted future earnings streams forindividuals in or out of employment as a consequenceof their health status. In a landmark paper, Mishan4

argued that the human-capital approach in CBA stud-ies was flawed for two reasons: 1) the method assumes

that the primary goal of society is to maximize thegross national product, and this is highly questionable;2) the method is inconsistent with the theoreticalfoundations of CBA, which are based upon a com-

pensation test (the Potential Pareto Improvement cri-terion). As pointed out by Mishan and explained byGafni,’ the practical extension of the theory underlyingCBA is the estimation of individuals’ maximum will-

ingness-to-pay (WTP) to secure implementation of aprogram (compensating variation) or the minimum in-dividuals would need to be compensated (willingness-to-accept) to forgo a program (equivalent variation).

Received June 11, 1993, from the Department of Clinical Epide-miology and Biostatistics, McMaster University, Hamilton, Ontario,Canada IBO’B1 and the Clinical Epidemiology Unit, General Hospitalof Mexico, Autonomous National University of Mexico, Mexico City,Mexico (JLV). Revision accepted for publication December 22, 1993.Supported in part by educational grants from Glaxo Canada Inc.and Schering-Plough Research Institute. This work was completedwhile Dr. Viramontes was an INCLEN graduate student at McMasterUniversity sponsored by the Rockefeller Foundation.

Address correspondence and reprint requests to Dr. O’Brien:

Centre for Evaluation of Medicines, St. Joseph’s Hospital, MarthaWing, Room 329, So. Charlton Avenue East, Hamilton, Ontario L8N 4A6,Canada.

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 2: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

290

Two general approaches to the estimation of WTPvalues-indirect measurement and direct measure-

ment-can be distinguished. The indirect approachexamines previous real-world decisions that involvetradeoffs between money and expected health out-comes ; for example, implied dollar values by wagepremiums accepted by workers in occupations withknown increased health risks.6 In contrast to inferringpreferences from actual choices, the direct measure-ment of WTP uses survey methods to elicit stated dol-lar values for some non-marketed phenomenon pro-duced or destroyed by the project being evaluated. Ineconomics this second approach has been termedcontingent valuation, because the respondent is beingasked to consider the contingency of a market’s ex-isting for the thing being valued. A number of CBAstudies in environmental and transport economics have

employed contingent-valuation techniques to valuenon-marketed phenomena such as improved air qual-ity.’-9 Contingent valuation studies are also becomingmore widespread in health care and have been un-dertaken in areas such as arthritis management,’&dquo;&dquo;ultrasonography) t2 care of the elderly,&dquo; managementof hypertension/4,t5 blood transfusion,’6 and the useof ionic versus non-ionic contrast median. 17

In implementing contingent valuation in health care,there is debate about who should be asked (e.g., cur-rent users of a program, ex-users, potential users); whatthey should be asked (e.g., willingness to pay for, oraccept, certain or expected outcomes); and how theyshould be asked (e.g., single open-ended questions ormultiple close-ended questions in a &dquo;bidding&dquo; game).As indicated by Froberg and Kane18-21 in their com-prehensive review of health preference measures, verylittle is known about the psychometric properties ofstated willingness-to-pay measures in health care.

Available data are largely limited to two studies byThompson et al. of arthritis patients,&dquo;,&dquo; where test-retest reliability was found to be low (r = 0.25) andthere was indirect evidence of a lack of agreementbetween WTP responses and health-state utility asmeasured by standard gamble. In their review, Frobergand Kane did not find any health care study in whichWTP had been directly compared with other measuresof health-state preference.The purpose of the present study was to evaluate

one approach to contingent valuation in terms of con-struct validity and test-retest reliability. The thera-peutic context of the study was chronic lung disease(chronic bronchitis, emphysema, or asthma) and ques-tions related to current health status and preferencesfor hypothetical therapies and outcomes. We soughtto address four questions regarding WTP:

1. What is the association between a person’s cur-rent disease severity and his or her willingness to payfor a therapy that offers a specified expected health

improvement? As a test of construct validity we pre-dicted that, for a given income status, persons withgreater disease severity will be willing to pay more forthe same expected health improvement.

2. What are the associations between WTP for health

improvement and other measures of health-state pref-erence such as the standard gamble? What is the as-sociation between WTP and a generic measure of cur-rent health-related quality of life (Short-Form 36)?Associations between WTP and such health status in-dicators would be evidence of the convergent validityof WTP.

3. Does the use of a simple &dquo;bidding game&dquo; to mea-sure WTP introduce a starting-point bias?

4. What is the test-retest reliability of WTP over aperiod of four weeks?

Methods

A convenience sample of patients who had chroniclung disease [defined as forced expiratory volume in1 second (FEV,) of less than 70% of predicted] wholived in the Hamilton area was identified from a res-

pirology clinic register held at McMaster UniversityMedical Centre. Given that the interview was likely tobe cognitively demanding, we excluded from surveyall patients who were more than 70 years old, thoseunable to read or speak English, and those who hadother health problems such as deafness or mental illhealth where understanding of the survey might havebeen a problem. Patients eligible for survey were ap-proached by mail with the consent of, and a sup-porting letter for the study from, their physician. Thepatients were interviewed in their homes by a profes-sional survey interviewer; the survey did not requireany clinical examination. All patients gave written in-formed consent to be interviewed.

SYMPTOMS AND DISEASE SEVERITY

Self-reported symptom data were collected fordyspnea, cough, phlegm, and wheeze using the mod-ified scales proposed by the Medical Research Council(MRC) of Great Britain.zz,z3 For dyspnea this question-naire comprises a five-point scale based on degrees ofphysical activity that produce breathlessness. The MRCquestions for cough and phlegm record symptom fre-quency on a four-point scale (none, mild, moderate,severe), and that for wheeze records frequency on athree-point scale (none, occasional, most days). Wecombined the four symptom responses into a simpleclassification of disease severity of mild, moderate, andsevere. We defined severe disease as at least one of

the four symptoms at its highest level (i.e., most lim-

iting or frequent). Mild disease was defined as all foursymptoms reported as mild. Moderate disease in-

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 3: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

291

cluded all conditions not classified as severe or mild

(i.e., mixes of mild and moderate symptoms).

OXYGEN-COST DIAGRAM (OCD)

The OCD is a visual-analog scale for measuringdyspnea and disability in patients with respiratory dis-ease.&dquo; The OCD is presented as a vertical line with 13everyday activities ranging from &dquo;brisk walking uphill&dquo;to &dquo;sleeping&dquo; marked against the line at intervals cor-responding to their oxygen costs (i.e., metabolic equiv-alents required to carry them out). Respondents areasked to make a mark on the line corresponding tothe most strenuous activity they could undertake be-fore they became breathless. Responses are recordedin millimeters from the bottom of the 10-cm scale, withlow scores indicating a higher propensity towardsbreathlessness for a given activity level. For example,a person who becomes breathless sitting or standingwould score around 10 out of 100, while a person who

becomes breathless only with &dquo;medium walking uphill&dquo;would score around 80 out of 100.

SHORT FORM 36 (SF-36)

The SF-36 is a generic questionnaire for measuringhealth-related quality of life in eight domains: generalhealth perception; physical functioning; role limita-tions due to physical health problems; role limitationbecause of emotional problems; social functioning;bodily pain; vitality (i.e., energy); general mental health.These eight domains comprise a total of 36 items. Foreach domain, item scores are summed and trans-formed onto a scale from 0 (worst) to 100 (best). SF-36has been developed over a number of years from theMedical Outcomes Study&dquo; and has undergone validityand reliability testing in a number of populations/6,27including patients with chronic airways obstruction.&dquo;Its breadth of coverage and ease of use have made ita popular measure of health-related quality of life intreatment-evaluation studies.

HEALTH-STATE VALUES AND UTILITIES

The preference value of each respondent’s currenthealth state in the interval from &dquo;death&dquo; ( = 0) to &dquo;healthylung functioning&dquo; ( = 100) was elicited by two differentmethods: rating scale and standard gamble.

Rating scale. As described by Feeny and Torrance)29our rating scale is a vertical and calibrated visual-an-

alog scale (sometimes called a &dquo;feeling thermometer&dquo;)with labeled anchors of &dquo;death&dquo; (= 0) and &dquo;healthylung functioning&dquo; (= 100). Consistent with the guid-ance given in Furlong et apo the respondents werefirst asked to use this scale to rate two marker health-

state descriptions for chronic lung disease (one goodand one bad); they were then asked to rate their owncurrent health states on the scale. Following termi-

nology in the decision sciences, we refer to this mea-sure as a value rather than a utility because the latteris measured under conditions of uncertainty.Standard gamble. This is the classical method from

economics and decision theory for measuring pref-erences for uncertain outcomes; the method stems

directly from the continuity axiom of the theory ofdecision making under uncertainty proposed by vonNeumann and Morgenstern.31 The respondents wereasked to compare two hypothetical therapy options:option Y is a certain outcome (the status quo) wherethe person remains in the current health state for the

rest of his or her life; option X is to take a hypotheticalnew medication that has an uncertain binary out-come-it will either return the person to healthy lungfunctioning (with probability p) or result in immediatedeath (with probability 1 - p).

Using colored pie charts with percentage slices asvisual aids for probability concepts, the interviewerssystematically varied the probabilities in option X tofind the risk of death where the respondents wouldregard options X and Y as being equivalent. Consistentwith the theory underlying this measurement, the ob-served indifference probability (in this case expressedas a percentage) is the utility value of the respondent’scurrent health state in the interval from death (= 0)to healthy lung functioning ( = 100). The intuitive logicof the standard gamble is that the extent to which aperson will accept the risk of a worse outcome (i.e.,death) to achieve a better outcome provides infor-mation about the individual’s utility for his or her cur-rent health state.

WILLINGNESS TO PAY (WTP)

Given uncertainty about the health outcomes as-sociated with a procedure, it has been argued thatWTP questions should be asked about expected healthoutcomes.s Therefore, the WTP questions were de-signed to flow from the previous questions based onthe standard gamble and utilized the same scenarioof a hypothetical medication, using the same con-struct of probabilistic binary outcomes. For the WTPquestions the outcome probabilities were fixed at 99%chance of healthy lung functioning and 1% chance ofimmediate death. The respondents were told that theirhealth insurance did not cover this medicine and theywould be required to pay some amount out-of-pocket(see the appendix for verbatim questions). To elicit themaximum each individual would be willing to pay tosecure this medication, we employed a simple &dquo;bid-

ding game&dquo; method.As illustrated in figure 1, for a risk-averse individual

with a concave utility function over lung functioning,the utility value of the (99% healthy, 1% dead) lotteryfor WTP is 0.99. For each individual we predict thatwillingness to pay for this lottery will be negatively

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 4: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

292

FIGURE 1. Theoretical utility function forimprovement in lung functioning for a risk-averse individual. For each individual the

utility (p*) of current lung function (LF, ) is

a random variable determined by his or herresponse to the standard gamble in the in-teival from death (DI to healthy lung func-tion (LF,,). For all individuals the willing-ness-to-pay question offers a therapy withutility of 0.99 in the same interval with cer-tainty-equivalent lung function LF( E.

correlated with the utility value of his or her currenthealth status (p* by the previous standard gamble)and, by extension, with the gain in utility associatedwith the lottery.

BIDDING GAME

A technique commonly employed in contingent-val-uation studies is a simple bidding game where a re-spondent is bid up or down by the interviewer in aniterative fashion to converge upon his or her maximumWrp.32 An advantage of the bidding game is that it

requires only yes/no responses to each bid and thushas more market realism than single open-endedquestions asking respondents for their maximum WTPs.An important disadvantage of the bidding game is thethreat of starting-point bias, where the respondent’sfinal WTP value is not independent of the first bidprompted by the interviewer. Our study provided anopportunity to test the hypothesis of starting-pointbias. Accordingly, the patients were randomly as-

signed to one of five starting bids ($10, $25, $50, $75,$100). For each starting bid the interviewers followedpredetermined bidding algorithms; the bidding algo-

FIGURE 2. Bidding game algorithm for $50 starting bid.

rithm for the $50 starting bid is presented in figure 2.Mindful not to frustrate or bore respondents with alengthy bidding game, we designed all the algorithmsto have no more than three bids.

STATISTICAL METHODS

Means and standard deviations (SDs) are reportedfor all continuous data. Given that household income

(ability to pay) is expected to be associated with WTP,our tests of hypotheses need to allow for such con-founding. Accordingly we undertook analysis of co-variance (ANCOVA) for two of our main hypotheses: 1)variation in measurement scores by disease severityas the main effect (three groups) with household in-come as a covariate; 2) variation in mean WTP by start-ing bid (five groups) with household income and healthstatus (SF-36 domain of Health Perception) as covar-iates. Associations between measures are analyzed byPearson product-moment correlation. Test-retest re-liability for instruments is reported by intra-class cor-relation coefficient.

Results

From a sample of 133 names of eligible subjects withchronic airways limitation identified from the clinicregister, a total of 102 interviews were conducted. Rea-sons for not interviewing were 1) that subjects couldnot be contacted (16/31); 2) that subjects refused to

participate (13/31); 3) exclusion due to language orhearing problems (3/31). Details of respondent char-acteristics are presented in table 1. Fifty-three percentof the subjects reported that they had been diagnosedby a physician or having chronic bronchitis; 45%, em-physema ; and 53%, asthma; more than half of the sam-ple (57%) reported more than one lung disorder. Themost frequent symptom reported (at the level of mod-

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 5: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

293

erate to severe) was phlegm (53%), followed by cough(50%), wheeze (36%), and dyspnea (31%). Based onsymptom severity, the respondents where classifiedas having mild (21/102), moderate (22/102), or severe(59/102) disease.

FEASIBILITY AND COMPLIANCE

The interviewers rated the subjects’ compliance andunderstanding of the interview as generally good; theWTP questions had a response rate of 94% (96/102).

The question with the greatest number of refusals washousehold income (10/102 refused). No interview was

terminated prematurely by a respondent. The meanduration of the interviews was 50 minutes (SD 13).

RELATIONSHIPS WITH DISEASE SEVERITY

Table 2 presents the measurements for all the re-

spondents and also compares the responses by sub-groups of mild, moderate, and severe disease; all meansare adjusted for income. The mean response on theoxygen-cost diagram (OCD) was 53 mm (SD 19), whichindicates that the average respondent became breath-less at an activity level of &dquo;medium walking.&dquo; Therewas a clear and significant gradient for OCDs betweengrades of disease severity in the expected direction.On all eight domains of Short Form 36, the mean

scores for the combined sample were lower (indicatingpoorer health status) than published reference scoresfor healthy persons from the general public.27 In sixof the eight domains there was a consistent gradient,in the expected direction, between disease-severitygroups; these differences were statistically significant

18~e ~ ~ Characteristics of Survey Respondents (n = 102)

for health perceptions, physical functioning, physicalrole, and energy.

In the health-state-pmference interval from 0 (death)to 100 (healthy lung functioning), the standard gamble

Table 2 o Mean (SD) Health-related Quality of Life, Utility, and Willingness-to-pay (WTP) Responses: ANCOVA by Disease SeverityControlling for Income

*WTP was missing for six respondents, reducing the sample to 96 for this item.

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 6: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

294

Table 3 o Mean (SD) Health-related Quality of Life, Utility, and (WTP) Responses: ANOVA by Household Income*

*Ten respondents refused to answer the income question.

method generated a higher preference score for therespondent’s current health state (utility = 83) than

did the rating scale (value = 63). However, both meth-ods showed clear and statistically significant gradientsbetween disease-severity subgroups in the predicteddirection. For example, persons who had mild diseasestated that, on average, the maximum risk of death

they would accept from therapy that relieved them oftheir lung disease would be 9% (utility = 0.91), com-pared with those in the severe-disease group, whowould accept up to 22% risk of death (utility = 0.78).

The mean WTP for a therapy that offered a 99%chance of healthy lung functioning but a 1% risk ofdeath for all respondents was $113 (SD 205) per month;the median WTP was $65. Expressed as a percentageof repondents’ household incomes, the mean WTPwas 5%. The large difference between the mean andmedian values indicates a skewed distribution and the

large standard deviation for mean WIT* indicates awide spread of responses. When WTP was analyzed

l3MB 4 o Pearson Correlations between Instrument Scores*

*All correlations were significant at the 5% level.

by respondent disease severity there appeared to besome gradient, as predicted, but it was not significant(p = 0.09).

RELATIONSHIPS WITH INCOME

Table 3 presents ANOVA by four income groups andindicates no obvious difference or trend with respectto any of the health measures. In contrast, there ap-peared to be a marked and significant gradient, in thepredicted direction, between income and unadjustedmean WTP.

ASSOCIATIONS BETWEEN MEASURES

Pearson correlations between instruments are pre-sented in table 4. The strongest and most consistent

agreement appeared to be that between OCD and SF36 domains (range r = 0.27 to r = 0.65). Preferencevalue for current health as measured by rating scalehad a greater association with OCD and SF-36 thandid health state utility by standard gamble. On theother hand, the standard gamble was the method thatcorrelated most highly with WTP (r - - 0.46), withhigher WTP associated with lower current health state.Associations between WTP and SF-36 were small and

ambiguous (with some unpredicted signs to coeffi-

cients).

STARTING-POINT BIAS

Table 5 presents data on final maximum WTP an-

alyzed by starting bid. There appeared to be no ob-vious trend or heterogeneity between groups definedby starting bid when the final bids were analyzed asmean responses adjusted for income and health sta-tus. When the medians were computed for the indi-vidual groups there was a suggestion of the predictedtrend, but this did not reach conventional statistical

significance (p = 0.07).

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 7: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

295

TEST-RETEST RELIABILITY

Table 6 presents the test-retest reliability data forthe 20 respondents who were interviewed a secondtime, approximately four weeks after their first inter-views. Intraclass correlations were acceptable for allthe instruments, including WTP (r = 0.66), which wasbetter than the rating scale (r = 0.61) but worse thanthe standard gamble (r = 0.82).

MULTIVARIATE MODELS

Variation in WTP was also analyzed by multiple lin-ear-regression models to determine whether other fac-tors (e.g., age, sex, education) were confounding theunivariate relationships explored in the main analysis.These models (not reported here) confirmed that themain association was between WTP and income and

that very little of the variation in WTP, even when

controlling for income, was explained by health statusor symptom measures.

Discussion

We undertook this study being supportive of theprinciples underlying the general concept of WTP buthaving concerns about the practical measurement ofWTP. In particular, we sought to apply to WTP prin-ciples of validity and reliability testing that have beenused to assess the measurement properties of instru-ments for health-related quality of life.ls-2t Responsesto health-related questions are not verifiable becauseno &dquo;gold standard&dquo; exists, and this absence of a singlecriterion has led analysts to principles of constructvalidation. In some circumstances, for preferencemeasures such as the standard gamble and WTP theremay exist a simple test of response validity: observingwhether the individual, if actually faced with the ex-perimental choice, acts in a way consistent with whathe or she stated in the survey. In the present studywe were not able to compare stated and revealed pref-erences, so we attempted to explore the construct

Table 5 9 Relationship between Maximum Willingness to Payand Starting Bid, ANCOVA Controlling for Income

TAMB 6 o Test-Retest Reliabilities (Four-week Interval) ofthe Instruments for a Subsample of Respondents(n = 20)* *

*All correlations were significant at the 5% level.

validity of stated WTP in two general ways: 1) doesWTP appear to measure what we think it should be

measuring, in an unbiased way, and 2) does WTP haveconvergence with other measures, such as the stan-dard gamble, that measure similar constructs of healthstate preference?

Data from this study offer something to both WTPantagonists and WTP protagonists. The antagonistswill conclude that, unlike other health preference andhealth-related quality-of-life measures, WTP did notoffer good discrimination. Controlling for income, therewas no significant association between severity of dis-ease and willingness to pay for a given health im-provement. Antagonists probably would also draw at-tention to the suggestion (for median responses) ofstarting-point bias as a threat to the validity of thismethod of interviewing. In summary, the case againstWTP from these data is that the elicitation method

may introduce bias and the resultant data have largevariance such that the validity of the measure as adiscriminative tool is compromised. Furthermore, thesecross-sectional data can tell us nothing about the lon-gitudinal issue of an instrument’s ability to detectchange (responsiveness).

Protagonists for WTP probably would emphasizedifferent aspects of this study and the data. In termsof discrimination between categories of disease se-verity, the &dquo;severe&dquo; group had a WTP that was double(in absolute $ and percentage-income terms) that ofthe &dquo;mild&dquo; and &dquo;moderate&dquo; groups; but this differencewas not significant, the standard deviation ($249) beingtwice the mean ($141) for &dquo;severe&dquo; disease. This non-

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 8: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

296

significant finding might lead the WTP protagonist tocriticize two aspects of study design: 1) the samplesize may have been too small to detect WTP differences

between groups because of its variability; 2) the ex-perimental part of the survey, with its random assign-ment to starting bids, may actually have induced greatervariation in the WTP data than would otherwise have

existed. The WTP protagonist would also point outthat the test-retest reliability of WTP was better thanthat of the rating scale for health utility and as goodas that of a popular disease-specific scale, the oxygen-cost diagram.A person’s willingness to pay is clearly limited by

his or her ability to pay; this is appropriate becausethe method is attempting to elicit what the personwould be prepared to forgo from current (and future)consumption to achieve an expected health improve-ment. In this way WTP is similar to the standard gam-ble, where the value of a health state is measured interms of what a person is prepared to forgo or tradeoff (i.e., risk of death). Our data show a clear positiveassociation between household income and WTP; thiswas a predicted association, and therefore offers par-tial evidence that we are measuring the desired con-struct. The association between income and WTP is

not a source of measurement bias but it does reflect

an important equity assumption underlying cost-benefitanalysis-that the current income distribution is ac-ceptable. In practice, a CBA study using WTP datamight incorporate explicit distributional weights foroutcomes based upon income or other factors.&dquo;’ Fromthis study two practical issues arose with income: 1)non-response to income questions was relatively highat 10%; 2) the definition and measurement of house-hold income are open to interpretation due to issuesregarding the definition of the household and also theinfluence on WTP of accumulated wealth and assets,the distribution of which may not be identical to in-

come.

There are a number of potential sources of mea-surement bias in WTP studies.’ We believe out studywas the first to explore starting-point bias in WTP forhealth improvement, but our data are not conclusive;previous studies in environmental economics have de-tected such bias:J4,3s while others have not.36,;!7 Other

sources of bias are more difficult to detect with hy-pothetical preference data in the absence of the &dquo;goldstandard&dquo; of actual decisions. For example, a form ofstrategic bias would be a veiy high or low WrP bid asa protest against the interview. In his study of arthritispatients, Thompson&dquo; decided to trim his data so thatresponses of more than 50% of income for arthritiscure or zero WTP were judged to be &dquo;implausible re-sponses&dquo; and dropped from the main analysis. Thelegitimacy of such data trimming is questionable; theanalyst will probably reduce heterogeneity and noisein the data, but it is difficult to see how one determines

whether bias has been reduced or increased by suchaction.

We suspect that a major difficulty and source ofresponse heterogeneity with WTP for health improve-ments is what has been called hypothetical bias.38 Ask-ing the respondent to conceive of a hypothetical mar-ket for health improvements is, at best, cognitivelydemanding. The mechanism of the bidding game hasmore market realism than earlier health studies that

used one-shot open-ended questions about the max-imum a person would pay. But we believe our re-

spondents, in the environment of universal health in-surance in Canada, found it difficult and perhapsunrealistic to focus on what they would pay for anexpected health improvement of such magnitude. Fur-ther research on methods for explaining the mea-surement task and presenting additional data (e.g.,current per-capita tax expenditures on health careitems) may be useful. However, we suspect that WTPmethods will generate the most reliable data in studiesof minor diseases where the respondents already havesome familiarity with consumer purchases such asover-the-counter medicines, for example, in the treat-ment of allergies or migraines.Our discussion of the measurement issues arising

from this study leads us to the broader considerationof the theoretical validity of the measurement we un-dertook. Recently Gafni’ has argued that the relevantrespondents for WTP studies in publicly funded healthcare systems (such as Canada’s) are the tax-payinggeneral public, with the relevant question being howmuch they would pledge as an additional insurancepremium to their taxes so that a therapy would beavailable if they ever needed it.Even in a predominately private market insurance-

based system such as that of the United States, therelevant dollar-health tradeoffs can be elicited by of-fering individuals a choice between hypothetical healthmaintenance organizations (HMOs) that offer differentannual premiums but alternative packages of healthcare and health benefits.

This tax-premium-insurance-premium approachhas considerable conceptual merit because it could

capture the important aspect of WTP for benefit toothers (externalities)-something usually omitted fromeconomic evaluation studies.&dquo; But the measurement

challenges are likely to be large because respondentswould be asked to value probabilistic health outcomesassociated with therapies and diseases with whichthey may not be familiar. We speculate that such pub-lic surveys will generate data even more heteroge-neous than those we obtained in our study, whichmay limit their usefulness for policy making; but ul-timately this is an empirical question. We concludethat WTP still holds conceptual promise but that theempirical agenda before it can become recognized asa valid and reliable measurement tool will be large.

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from

Page 9: Willingness to Pay - McGill University development of methods to measure willingness to pay (WTP) has renewed interest in cost-benefit analysis ... benefit minus cost) of a program

297

The authors are grateful to the study interviewers, Joyce Brown, JuneHogg, and Diane Wolkowski; and to the physicians and patients whoagreed to participate. Valuable comments on an earlier draft of thispaper were received from Stephen Birch, David Feeny, George Tor-rance, Chris Woodward, Greg Stoddart, and other members of thePolinomics Seminar group.

References1. Drummond MF, Torrance GW, Stoddart GL. Methods for the

Economic Evaluation of Health Care Programmes. Oxford: Ox-ford University Press, 1987.

2. Johannesson M, Jonsson B. Economic evaluation in health care:

is there a role for cost-benefit analysis? Health Pol. 1991;17:1-23.

3. Birch S, Gafni A. Cost-effectiveness/utility analyses. Do currentdecision rules lead us to where we want to be? J Health Econ.

1992;11:270-96.

4. Mishan EJ. Evaluation of life and limb: a theoretical approach.J Polit Econ. 1971;79:687-706.

5. Gafni A. Using willingness-to-pay as a measure of benefits: whatis the relevant question to ask in the context of public decisionmaking? Med Care. 1991;29:1246-52.

6. Marin A, Psacharopoulis G. The reward for risk in the labourmarket: evidence from the UK and a reconciliation with other

studies. J Polit Econ. 1982;90:827-53.

7. Dickie M. Willingness to pay for Ozone control: inferences fromthe demand for medical care. J Environ Econ Manage. 1991;21:1-16.

8. Jones-Lee MW. The Value of Life: An Economic Analysis. London:Martin Robertson and Co., 1976.

9. Cummings RG, Brookshire DS, Schulze WD. Valuing Environ-mental Goods: A State of the Art Assessment of the ContingentValuation Method. Totowa, NJ: Rowman and Allanheld, 1986.

10. Thompson MS, Read JL, Liang M. Feasibility of willingness-to-pay measurement in chronic arthritis. Med Decis Making.1984;4:195-215.

11. Thompson MS. Willingness to pay and accept risks to cure chronicdisease. Am J Public Health. 1986;76:392-6.

12. Berwick DM, Weinstein MC. What do patients value? Willingnessto pay for ultrasound in normal pregnancy. Med Care. 1985:23:881&mdash;93.

13. Donaldson C. Willingness to pay for publicly-provided goods: Apossible measure of benefit? J Health Econ. 1990;9:103-18.

14. Johannesson M, Aberg H, Agreus L, Borquist L, Jonsson B. Cost-benefit analysis of non-pharmacological treatment of hyperten-sion. J Intern Med. 1991;230:307-12.

15. Johannesson M, Jonsson B. Willingness to pay for antihyper-tensive therapy&mdash;results of a Swedish pilot study. J Health Econ.1991;10:461-74.

16. Estaugh SR. Valuation of the benefits of risk-free blood. Willing-ness to pay for hemoglobin solutions. lnt J Tech Assess HealthCare. 1991;7:51-7.

17. Appel LJ, Steinberg EP, Powe NR, Anderson GF, Dwyer SA, FadenRR. Risk reduction from low osmolality contrast media. Whatdo patients think it is worth? Med Care. 1990;28:324-34.

18. Froberg DG, Kane RL. Methodology for measuring health-statepreferences. I: Measurement strategies. J Clin Epidemiol.1989;42:345-54.

19. Froberg DG, Kane RL. Methodology for measuring health-statepreferences. II: Scaling methods. J Clin Epidemiol. 1989;42:459-71.

20. Froberg DG, Kane RL. Methodology for measuring health-statepreferences. III: Population and context effects. J Clin Epidemiol.1989;42:585-92.

21. Froberg DG, Kane RL. Methodology for measuring health-state

preferences. IV: Progress and research agenda. J Clin Epidemiol.1989;42:675-85.

22. Samet JM. A historical and epidemiologic perspective on res-piratory symptoms questionnaires. Am J Epidemiol. 1978;108:435-46.

23. Task group on surveillance for respiratory hazards in the oc-

cupational setting. Pervoki SM (Chairman). Surveillance for res-piratory hazards. ATS News. 1982;8:12-6.

24. McGavin CR, Artvinli M, Naoe H, McHardy GJR. Dyspnoea, dis-ability, and distance walked: comparison of estimates of exerciseperformance in respiratory disease. Br Med J. 1978;2:241-3.

25. Stewart AL, Hays RD, Ware JE. The MOS short-form generalhealth survey. Reliability and validity in a patient population.Med Care. 1988;26:724-35.

26. Ware JE, Donald C. The MOS 36-item short-form health survey(SF-36). I. Conceptual framework and item selection. Med Care.1992;30:473-81.

27. Brazier JE, Harper R, Jones NMB, O’Cathain A. Validating theSF-36 health survey questionnaire: new outcome measure forprimary care. B M J. 1992;305:160-4.

28. Mahler DA, Faryniarz K, Tomlinson D, Colice GE. Impact of

dyspnea and physiologic function on general health status inpatients with chronic obstructive pulmonary disease. Chest.1992;102:395-401.

29. Feeny DH, Torrance GW. Incorporating utility-based quality-of-life assessment measures in clinical trials. Two examples. MedCare. 1989;27:S190-S204.

30. Furlong W, Feeny DH, Torrance GW, Barr R, Horsman J. Guideto design and development of health-state utility instrumen-tation. Hamilton, Ontario: CHEPA Working paper series, Mc-Master University, 1990;9.

31. von Neumann J, Morgenstern O. Theory of games and economicbehaviour. Princeton: University Press, 1953.

32. Randall A. Bidding games for valuation of aesthetic environ-mental improvements. J Environ Econ Manage. 1974;1:132-49.

33. Sugden R, Williams AH. The Principles of Practical Cost-benefitAnalysis. Oxford: Oxford University Press, 1979.

34. Brookshire DS, Randall A, Stoll JR. Valuing increments and dec-rements of natural resource service flows. Am J Agric Econ.1980;62:478-88.

35. Thayer MA. Contingent valuation techniques for assessing en-vironmental impacts: further evidence. J Environ Econ Manage.1981;8:27-44.

36. Rowe RD, d’Arge RC, Brookshire DS. An experiment in the eco-nomic value of visibility. J Environ Econ Manage. 1980;7:1-19.

37. Boyle KF, Bishop RC, Welsh MP. Starting point bias in contingentvaluation bidding games. Land Econ. 1985;61:188-94.

38. Labelle R, Hurley J. Implications of basing health care resourceallocation on cost-utility analysis in the presence of externalities.J Health Econ. 1992;11:259-78.

APPENDIX

Willingness- to-pay Scenario

Now let’s assume that choice Y offers a 99% chance of

restoring you to healthy lung functioning, and there is onlya 1% chance of immediate death. That is, of 100 people whotake the medicine in choice Y, 99 will have healthy lungfunctioning restored and one person will die.Now assume that the medication in choice Y is expensive

and is not fully covered by your health insurance, so youare required to pay some amount out of pocket, each monthand for the rest of your life, for this medication. Thinkingabout the value of this medication to you and how much

you could, realistically, afford to pay each month: would themaximum amount you would be willing-to-pay be [biddinggame begins]

at MCGILL UNIVERSITY LIBRARIES on July 14, 2009 http://mdm.sagepub.comDownloaded from