prof. gouke j bonsel mph md phd public health methods obstetrics

Can secondary analysis teach us on best practice of universal QoL measurement

Arguments and (some) Evidence

Prof. Gouke J Bonsel MPH MD PhD

Public Health MethodsObstetrics

Academic Medical Centre - University of Amsterdam

Working Paper No.1021 November 2005

STATISTICAL COMMISSION and STATISTICAL OFFICE OF THEUN ECONOMIC COMMISSION FOR EUROPEAN COMMUNITIESEUROPE (EUROSTAT) CONFERENCE OF EUROPEAN WORLD HEALTHSTATISTICIANS ORGANIZATION (WHO)

Joint UNECE/WHO/Eurostat Meetingon the Measurement of Health Status (Budapest, Hungary, 14-16 November 2005)

Session 5 – Invited paper

051116 Budapest 2

Agenda

• Comparative Secondary analysis: wanted?• Goals of Measurement

– Contents– Process

• C2A– Quantitative - Validity– Qualitative - Q/D Vignette– Quantitative - Coverage/Refinement

general belief: many issues can be resolved by data

051116 Budapest 3

Comparative secondary analysis (C2A)

• >2 crude datasets with– known questionnaire + codification rules– known population (at least vs. general)– sharing > 1 intended concept– sufficient common question/response types – sufficient language commonalities

• special cases– 1 questionnaire, n populations– n questionnaires, > 1 populations

051116 Budapest 4

Comparative secondary analysis : types

• quantitative, analytical content-driven methods; with and w/o external criterion

• quantitative, descriptive (technical) performance methods

• qualitative, semantics• qualitative comparison response form, other

operational features

all head-to-head analysis will assume some aspects

to be constant over the units to be compared

051116 Budapest 5

Goals of QoL measurementCONTENTS

• Intrinsic goals of health systemsWHO (+EU?)

– Health (DALE-like; class) Level Distribution– Responsiveness Level Distribution– Fairness of financing Distribution

Washington– Monitoring health population [Health Level]– Care provision [Responsiveness+ Level]– Equal pursuit [Health+Responsiveness

Distribution]

• External goals (GJB)– Employment, autonomy, reproduction

051116 Budapest 6

Goals of QoL measurementCONTENTS

• Health State measurement (per domain)– multi-item classical test Q (mQ): no– ordinal classification (class): yes– cf. ItemResponseTheory calibrated : perhaps

• Suitability for index development– in general : perhaps– to compose QALY/DALY estimates : yes

(but do not tell)

• Projection from mission WHO; to existing instruments and accepted classifications

051116 Budapest 7

Goals of QoL measurementPROCESS

• Efficient Elaboration

• Reliable Elaboration

• Universality of acceptance

• Flexibility of mode of administration

• Low price, low burden

• Fancy appearance

051116 Budapest 8

Some remarks (1)• Domains

– normal is absence of dys[...]. avoid ‘better than normal’ discussion (concept: health is positive, item: happy instead of downhearted). think of playing music: there is no better than playing on the beat

– from ALL external criteria, except ease of measurement and peace of mind follows about equal space for physical versus psychological domains; less (not absent) for social

– projection WHO mission, WHO classifications, other instruments: ex post or ex ante

– take care for conceptual unidimensionality artefact and the interpretation of empirical correlation as redundanceclassification nor IRT ‘require’ empirical independence

051116 Budapest 9

Some remarks (2)• Domains & Items & Time

– (pattern over) time is an essential conceptual component, recall technicalities of minor consideration.

– all concepts are continuous over time but some state changes appear as events or episodes or chronic states, or can only defined on (restricted) activity (=event) base hence frequency and intensity to some extent are semantic convention

– consequences:• time can emerge in pre-ambule, item, and response. uniformity

over the questionnaire essential. people ignore pre-ambules• empirical (pattern over) time therefore decides on ‘frequency’ or

‘intensity’, but on average both are relevant• experience tells that virtually all domains have day-to-day

fluctuations, if unstandardized response is during best condition• graphical tools useful if unidmensional item, sofar academic

051116 Budapest 10

Some remarks (3)• Items / Response

– burden of 3 domains * 6 responses smaller than 6 domains * 3 responses

– distributional economy ignored; 2 levels is not best, subjective scale experience does not apply; filtering assumes errorless contextfree threshold judgment. Shannon’s methodology

– equilizing in semantics across young/old, man/women, rich/poor, nationality or culture standardizes rather than exposes desired? differences

– contextual aspects often ignored; also suitability for translation

– reliability information (across time, observers, mode of administration) scarce

051116 Budapest 11

C2A: Quantitative Head-to-head Validity

• With external criterion – domain specific consequences or etiology and

personal chars with prespecified relation. strength of association (preferably RR)

– examples• psychological domain - use of specific care,

suicide; preceding life events• mobility domain - use of physiotherapy, aids;

fracture preceding period• cognitive domain - age

051116 Budapest 12


• Without external criterion– domain relations. prespecified patterns. strongly

dependent on population (random if about healthy). comparison difficult if scale type differs (mQ, class, IRT)

– special case if measure is contained as anchor– ex.

• psychological domains vs. physical domains• all domains vs. HUI-Ambulation or EQ-Mobility

051116 Budapest 13


• Without internal cutpoint calibration information– Domainwise IRT analysis

• With internal cutpoint calibration information (vignettes)– Domainwise CHOPIT like analysis

calibration: difficult but essential ALSO in countries

051116 Budapest 14

C2A: Qualitative Head-to-head

• Suitability to compose vignettes (timeless states, annual profiles) to arrive at Q/D values– self-reflective domain terms – linguistic (non-numerical), objective response

mode– clearcut time aspect – across domains ‘uniformity’ of terms, categories

and time

051116 Budapest 15

C2A: Quantitative Head-to-headEfficiency

• Source: investigations supporting increase of levels of EQ5D3L (‘HUI-fication’)

• No methods available to demonstrate benefit of more refinement

• Method: Shannon’s informativity measure = non-parametric (desirable) quantifier. Source US study http://www.ahrq.gov/rice/ and Med Care 2005;43:203-20&221-28

051116 Budapest 16

C2A EXAMPLEEQ-5D, HUI2 and HUI3 dimensions with # levels and #

unique permutations defined by full descriptive system. Common Dimensions are Grey

Level descriptionscommon domains

EQ-5D, HUI2 & HUI3

051116 Budapest 18

Absolute and % distribution of responses EQ-5D, HUI2 &

HUI3 (N = 3691)

From the number of

potential categoriesand observed frequencies

we can compute

Shannon numbers

The more equally distributed

the more info

the better reliability

the better sensitivity

051116 Budapest 19

H’ and J’ with skewed and rectangular

distributions in 3 level vs. 5 level system

Shannon numbers

are cardinal

051116 Budapest 20

H’ and J’ with skewed and rectangular

distributions in 3 level vs. 5 level system

If system extended

but potential categoriesare not occupied

then

absolute Shannon H same

relative Shannon J lower

051116 Budapest 21

Shannon’s Absolute Index (H’) and Evenness Index (J’) for the Common Domains of EQ-5D, HUI2 & HUI3.

051116 Budapest 23

ConclusionsC2A Efficiency by Shannon

• Head-to-head comparison tools allows choices on information gain by extension or recalibration

• Non-parametrically = advantage as independent from cutpoint (re)estimation

• In healthy or ambulatory diseased populations EQ5D3L equals HUI’s for common domains

• To be combined with differential cutpoint evaluation and reliability !

straightforwardly applicable for C2A

to WHO/EU data if similar population or experimentation

051116 Budapest 24

Reliability

• Systematic info to select item/respons– domain^respons * time (retest)– domain^respons * respondent (interobserver)– domain^respons * administration (retest)

• EQ5D: 3, 4 or 5– experiment on representative panel under

controlled conditions comparing 3L - 5L - RS– error, ‘filling the space’ and reliability

051116 Budapest 25

The task: Classify/Rate ‘Self’ and Disease vignettes

? = Response = 3L, 5L, or horizontal unanchored VAS

051116 Budapest 27

Inconstencies between 3L and 5L responses

by dimension, all 15 health vignettes (N = 82)

3L to 5L no error increase

051116 Budapest 28

Inter-observer reliability 3L vs 5L, 15 vignettes5L much better !

051116 Budapest 29

Test-retest reliability for respondents’ own health (3 wk interval) with ICC: 5L best !

051116 Budapest 31

Aaverage 3Lrs, 5Lrs and RS mean values by dimension, all

diseases and self-reported health. 3L and 5L values are transformed (linear) to RS

scale range (0-100)

051116 Budapest 32

Indirect and direct quantification of levels terms (n = 1230) Midway = 1/3 rate rule

051116 Budapest 33

Shannon’s index (H’) and Shannon’s Evenness index (J’) values for 3L and 5L. Comparison by dimension

051116 Budapest 34

Conclusions C2A Reliability of reponse terms

• Balance of 3 vs. 5 in favour of 5(after WHO-choice)– error increase low– reliability better– Shannon rises (much)

• Fairly easy to investigate if great # of respondents

• C2A if multiple respons formats for 1 domain

051116 Budapest 35

C2A of other process goals

• Universality of acceptance– quantitative and qualitative C2A depending on

codes for non-respons

• Flexibility of mode of administration– qualitative comparison only

• Fancy appearance– qualitative comparison only

• Low price, low burden– quantitatively possible but who cares?

051116 Budapest 36

Recommendations

• Comprehensive checklist for C2A– starting from structured agreed contents

goals and process/technical goals– distinguishing between quantitative (incl

Shannon) and qualitative research and what remains !

– specify models, techniques and success

• DATA can SOLVE debatesINTERESTING CHOICES remain

prof. gouke j bonsel mph md phd public health methods obstetrics

Documents

health dale

ease of measurement

external criteria

noordinal classification

redundance classification

ex ante

ex post

psychological domains