author's response to reviews meta-analytic …10.1186...author's response to reviews...

12
Author's response to reviews Title:Meta-analytic estimation of measurement variability and assessment of its impact on decision-making : the case of perioperative haemoglobin monitoring Authors: Emmanuel Charpentier ( [email protected]) Vincent Looten ( [email protected]) Björn Fahlgren ( [email protected]) Alexandre Barna ( [email protected]) Loïc Guillevin ( [email protected]) Version:5Date:30 September 2015 Author's response to reviews: see over

Upload: vuque

Post on 27-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Author's response to reviews

Title:Meta-analytic estimation of measurement variability and assessment of itsimpact on decision-making : the case of perioperative haemoglobin monitoring

Authors:

Emmanuel Charpentier ([email protected])Vincent Looten ([email protected])Björn Fahlgren ([email protected])Alexandre Barna ([email protected])Loïc Guillevin ([email protected])

Version:5Date:30 September 2015

Author's response to reviews: see over

Point-by-point answer to reviewers’ remarks

Emmanuel Charpentier Vincent Looten Björn FahlgrenAlexandre Barna Loïc Guillevin

August 2015

Notes sur les réponses à apporter aux lecteurs. Contient aussi du codedestiné à l’article révisé. Les parties en italiques sont des notes qui nous sontdestinées (aide-mémore, points litigieux, etc. . .).

1 Reviewer 1

1.1 Major compulsory revisions

1.1.1 Point 1

1. Eqs 4-8 - delta and rho are assumed independent of mu andsigma. This implies that e_c is more dispersed than e: e.g. the mathsimplies var(mu_ci) = sigma^2_m + sigma^2_m > var(mu_i) =sigma^2_m. However, table 1 suggests that var(mu_ci) < var(mu_i). Ithink this means that the model cannot fit these data well. This needscorrecting. A simple solution would be to allow delta, rho to becorrelated with mu, sigma.

We beg to differ : as shown page 4 of the revised paper PDF (whichnow explicitely models the calibrated SpHb error), from the definition ofthis calibrated SpHb (cSpHb) as the difference between SpHb and the errormeasured at calibration (SpHb0−tHb0), it follows that any systematic study-level or patient-level error is cancelled in cSpHb.

By the way, under the assumptions made in the text, this modeling showsthat the ratio ρ of correlated and raw SpHbcan vary between 0 and

√2, a

point which which escaped us at first sight. The model has been (slightly)modified to take this upper bound in account, which does not deeply modifythe conclusions.

1

One could note that there is indeed a “natural” correlation between µi,µ_ci and δi, as well as between σi, σ_ci and ρi, both due to the constraints :

µ_ci = µi + δi

σ_ci = σiρi

Since this correlation is not of direct interest to us, it seemed simplerand more natural to model two quantities of these triplets as independent ;however, modeling the correlations can be done, and is probably interestingin the case where one wants to model the error as a function of the referencevalue.

This cannot be done here, since none of the papers mentioning such alink gives any information allowing such a modeling.

1.1.2 Point 2

2. Another assumption is that mu_i and sigma_i are independent acrossi: is this reasonable, and do the data support it?

The data do indeed support this hypothesis, and this is now discussed(page 8, line 30–34 of the revised paper PDF).

1.1.3 Points 3 and 4

3. I imagine the original papers will have reported some informationabout the true values, though the authors don't mention this. The nexttwo comments can be checked if some information about true values isavailable; otherwise they require sensitivity analyses.

4. How reasonable is it to assume that errors (i.e. absolute errors)are independent of true values (end of p4)? Often it is more plausiblethat *relative* errors are independent of true values.

The assumption of the independence of errors and true values has beenmade, explicitly or implicitly, by all the papers : most of the use one variationor another of Bland & Altman’s method ; some of them display Bland &Altman’s plots, which do not give strong support to a constant relative errormodel. Testing a model against another is of course possible, but wouldrequire access to the original data.

More of concern is the mention by a couple of papers of a bias dependingon the true value (i. e. the device is not linear). These reports were not

2

documented with enough numerical information to allow modeling. We aretherefore bound to ignore this dependence, and to discuss the consequences.

This is now discussed (page 10 of the revised paper PDF, lines 7–15).

1.1.4 Point 5

5. I'm concerned by the assumption that true values are uniformlydistributed on [4,12]. How important is this? Is it used only in theevaluation of the sensitivity & specificity, or is it also used in themain modelling? It would be good to see a sensitivity analysis with adifferent distributional assumption. (The discussion, p7, onlyexplains that it is convenient to make this assumption.)

The modeling of the measurement errors involves only the various vari-abilities of SpHb, without any reference to any threshold. The discussion ofthe assessment of clinical impact has therefore been extended to cover thechoice of range (page 9 of the revised paper PDF, lines 17–35).

All these things should be clarified; also, it's not necessary tostate this assumption twice (p4 & 5).

Indeed. The second mention has been removed.

1.2 Minor essential revisions

1.2.1 Point 6

6. Please explain whether a measurement "series" corresponds tomultiple patients each with one measurement or one patient withmultiple measurements.

A series is a (supposedly homogenous) set of data presented as such bythe authors of an original paper. In most cases, that is the only dataset of apaper, but some authors choose to present their results on subsets relevantto their paper (for example preoperative, postoperative and 24-hours data inButwick’s paper).

Precision added to the paper (page 3 of the revised paper PDF, lines10–14).

1.2.2 Point 7

7. Eqs 9, 10 need subscript i throughout.

We apologize for this confusing lightening up of notation, which has beenfixed.

3

1.2.3 Point 8

8. A much clearer description is needed of how the evaluation isdone. It's not helpful, on page 4, to introduce new notation E and X:why not just define Sens = p(SpHb_i,j < t | tHb_i,j < t), etc.?

The text of the sumitted version had a serious typographical error : aconditional bar (|) has been mistakenly substituted to the conjunctive symbol(∧). We wish to assess absolute probabilities (of clinical decision error), notconditional probabilities. The usual diagnostic values (Se, Sp, PPV, NPV),which are conditional probabilities, are derived from these quantities. . .

We apologize for these typos, which went overseen at copy-editing.

1.2.4 Point 9

9. Fig 1 - by comparison with Table 1 I see that the forest plotdisplays mean +/- 2 standard deviations, not 2 standard errors as isusual - please clarify this in the figure and explain why you dothis. Similarly clarify the intervals for "bias" and "measurementerror" - are they respectively a credible interval for mu_m and aprediction interval for a new mu_i?

We hope to have clarified the tables’ and figures’ legends to better describewhat we meant.

1.2.5 Point 10

10. Tables 2-5 - these look too much like raw output from a computerpackage. Please decide what quantities are really needed (pointestimate and 95\% credible interval?) and delete the rest.

Done.We have done so a bit reluctantly : some readers like to have the standard

deviation as a rough indicator of variability whereas other readers look at theinterquartile range ; the standard error is an interesting index of the MCMCerror ; n_eff and R̂ allow to assess the quality of the convergence, which is acritical aspect of Bayesian inference through MCMC estimation of posteriordistributions. Since the full simulation summary of results are available in anadditionnal file, interested readers can always read it or, if other details areof interest, rereun our noweb source.

4

1.2.6 Point 11

11. Table 3 is hard to decipher: does the row labelled "bias" give thedistribution of mu_i* (that is, mu_i for a new study)? Does "standarderror" give the distribution of sigma_i* (in which case it should be"standard deviation")? All should be made clearer.

We hope that our rewriting has clarified what we meant.

1.2.7 Point 12

12. Typos: peroperative -> perioperative; haemoglobinemy ->haemoglobinaemia?; masurement -> measurement; DS -> SD in table 1.

Fixed. Thanks !

1.2.8 Points 13 and 14

13. n on p3 becomes N in table 1 - please use same notation.14. Similarly mu_s, sigma_s in the text become mu_log(s) etc. in table2 - please use same notation.

Indeed. . . Our revision of the tables keep the notations introduced in themodeling, using a star (∗) to denote predictive simulation.

2 Reviewer 2

2.1 Major compulsory revisions

2.1.1 Point 1

1. The authors should clarify the terminology used in the paper. Theyuse the term “measurement error” for the difference in measurementsbetween the methods. This suggests they assume that laboratorymeasurement tHB (with which they compare the new measure SpHb) isobtained without error. Although I admit that my expertise inmonitoring devices is limited, I don’t believe this is a reasonableassumption, in which case the “measurement error” is a misleadingterm. My understanding is that the authors model the differencebetween measurements by the two methods (as described for example byWilliamson et al, Statistics in Medicine 2002; 21:2013–2025).

5

We wanted to assess the impact of substituting a noninvasive monitoringof haemoglobinemia to the reference method currently used as the clinical“gold standard”, i. e. laboratory analysis. This method is indeed known tobe imperfect (the literature quotes values from 0.1 to 0.3 g/dL for the meanabsolute expected error). To the best of our knowledge, the clinical conse-quences of this imprecision have not been formally assessed in the literature.

But the point is that aneaesthesiologists’ methods are calibrated aroundthis imperfect measure, and are built around its imperfections. From a clini-cal, practical point of view, this measure has to be taken as the gold standardto which alternative methods are to be compared.

In other words, our paper ignores the imperfections of the referencemethod, assesses the impact of a substitution of the monitoring system tothe reference laboratory measure taken as a gold standard and, indeed,ignores the consequences of this reference method’s imprecision.

This is now clearly stated (start of the “modeling” section, p. 3 of therefised paper PDF, lines 17-20) and flagged as one of our paper’s limitations(item “study design”, page 10, lines 41–46).

Similarly, the need to assess the variability of the device is underscoredin the introduction. The lack of available methods for the meta-analytic as-sessment of variability is mentioned in the introduction and discussed at thestart of our discussion.

We did not, indeed, proceed to a systematic review of the available meth-ods comparison meta-analytic methods : our bias modeling is similar to var-ious published methods comparison meta-analyses ; our original point is themeta-analytic assessment of variability (and, to some extend, the assessmentby simulation of its clinical consequences).

2.1.2 Point 2

2. In the third paragraph of the introduction (as well as on a numberof other occasions), the authors refer to bias, but do not defineit. The terms should be defined more explicitly in the paper.

Done (in the background section, page 2 lines 6–11 of the revised paperPDF).

2.1.3 Point 3

3. The authors should clarify the main aims of the paper,i.e. modelling the difference or the corresponding variance or both?

6

The assessment of the bias and the variability are but necessary steps inorder to assess the possible clnical impact of the substitution of the moni-toring device to the reference method. The revised introduction should makethis clearer.

2.1.4 Point 4

4. I believe, it would be helpful to the readers to have the finalmodel (used to describe the aggregate data in table 1) writtenexplicitly. The authors state, at the end of the methods subsection“Modelling” that equations (9) and (10) allow to work with aggregatedata without the need for patient level data. The final model appliedto the aggregate data, along with the prior distributions, should bewritten down explicitly (and a code would be helpful to the readers- Icould not find the code in the supplement 2).

Such a statement of the model would be a restatement of the numberedequation in the text, possibly considered by some as unelegant.

An additional file contains the text of the stan model ; furthermore, therevised text now explicitely refers to the noweb source of the paper and tothe README file giving instructions on how to re-execute it.

2.1.5 Point 5

5. A methodological paper should set the new meta-analytic method inthe context of at least some literature on meta-analysis in generaland potentially more specialised references from relevant fields (suchas by Williamson et al mentioned above).

The only really original point of our paper is the meta-analytic assessmentof the variability. The current (almost non-existent) methods to do so arediscussed at length in the beginning of our discussion (now referred to in the“background” section, page 2 lines 19–26 of the revised paper PDF).

We agree that such a discussion could have taken place in the “back-ground” section ; however, a perusing of a few BMCMedical Research Method-ology papers let us we the impression that a more classical presentation wouldbe more welcome.

We are, of course, ready to rewrite our paper along those lines if the BMCeditors whish us to do so.

7

2.1.6 Point 6

6. The second sentence of the section on Model implementation andfitting: ".... The model directly implements equations (1) to (8),using (9) and (10) to model the likelihood of observed data ; we tookthe opportunity to sample the relevant parameters ..." is unclear,please rephrase. The procedure should be described in detail.Third sentence, Cauchy prior - please say to what parameter(s).

Rephrased and clarified. Our paper specifies Cauchy(0,3) priors (i. e. Cauchydistribution with 0 as centrality parameter and 3 as spread parameter).

2.1.7 Point 7

7. Section on "Diagnostic impact assessment"a. After equation (12), the authors refer to X as “true” values. Thetrue value is usually a quantity measure without error or an parameterdescribing the value in the population (such as random effect inmeta-analysis) – please see also comment 1.

Please see in 2.1.1 our response to comment 1.

b. The notation here is different than in section on Modelling. How dothe two parts relate to one another?

This part concentrates on how we use the meta-analytic variability as-sessment previously made to assess its clinical consequences. The additionsin the “model implementation and fitting” should make this clearer.

c. In the second to last paragraph: can you be more specific in whatyou mean by "all the usual diagnostic values indices"

We are more interested in the absolute probability of decision errors ratherthan the usual indices (Se, Sp, NPV, PPV), which are conditional probabili-ties. This is now mentioned in the discussion (page 9, lines 2–5 of the revisedpaper PDF).

2.1.8 Point 8

8. Results section, a. paragraph 3: "table 3 summarises errorsimulation results". Please be more specific . (see also comment 6)

Description added.

8

b. Section "Impact of calibration" - end of paragraph 2: a probabilityfor a new study is reported. Where does this result come from?

Precision added to the description of table 5.

c. Estimation of clinical impact - it would be helpful to provide thereader with some interpretation of the results.

Description added.

2.1.9 Point 9

9. Discussion,a. paragraph 1: phrase "plus whatever population-level parameters arenecessary to the mode" is too vague for the methodology paper

Description for our case added. This statement is true for any meta-analysis through a hierarchical model ; for example, we might have modeledmeans and standard deviations as correlated, which would have required theaddition of their correlations ; our statement is general.

b. paragraph 2: I don't think the reference to "statistical lore" isadequate for the methodological journal.

We attenuated our expression (page 7, line 38 of the revised paper PDF).But we are at loss about how to otherwise qualify this kind of entrenched

choice, defended only by historical habitus. The applied statistics field is fullof this type of choices, such as the “conventional” 0.05 significance level as a“significance threshold”, which is justified only by Fisher’s choice in his 1925book (in this case, the problem has been seriousy aggravated by the Neyman& Pearson theory of tests, who choose the same “significance threshold”). The“N > 30” belongs to the same class of folklore. . .

c. Results section, 1st paragraph, the authors refer to the posteriordistribution of measurement error in a new study. Is it a predictivedistribution? It should be discussed in the methods section.Some estimates for "a new study" were also mentioned in the Resultssection - it should be made clear how the estimates are obtained.

Done (see “Model fit” and “Clinical impact assessment” subsections).

9

2.2 Minor essential revisions

2.2.1 Point 10

10. Please describe the data in table 1 more clearly – are the valuesof M the differences? Is the M in table 1 different than the Mintroduced in the methods section “Diagnostic impact assessment”?

Corrected : the tables have been revised.

2.2.2 Point 11

11. Notation. I would suggest using different symbols denotingdifferent means and variances - using mu and sigma for all of them inequations (1)-(8) can be a little confusing.

We tried to be as clear as possible. Mathematical tradition insists onsingle-symbol variable names, and tolerates subscripts where needed. Proba-bility/Statistics practice seems to be entrenched to denote a mean by µ anda variance by σ, τ or υ.

The re-redaction of the “Methods” section should ease reading.We hope that the R and Stan code (which abundantly uses multiletter

variable names) is easier to follow. . . .

2.3 Discretionary revisions

2.3.1 Point 12

12. Is the term haemoglobinemy correct? Or do the authors meanhemoglobin or haemoglobinemia?

Indeed. Corrected. Thank you.

2.3.2 Point 13

13. Results. Impact of calibration - first line ...presented ontable... should be ..presented in table...

Corrected. Thank you.

10

2.3.3 Point 14

14. Discussion. Section on Clinical impact assessment , paragraph 3:(11) and (12) are integrals. Quadrature is a method of determiningthe area.

It seems that in Bayesian/MCMC/Machine Learning circles, the use of“quadrature” (or, worse, “cubature”) tend to denote high-dimensional inte-grals.

We reverted to “multiple integral(s)”.

2.3.4 Point 15

15. Conclusions: end of last bullet point; "... in any clinicallyrelevant extend..." , do you mean "... in any clinically relevantcontext"?

Another nice catch. But we meant “to any clinically relevant extent”.

11