interactions: types, tests and dangers

25
Interactions: Types, Tests and Dangers By Amy Wagaman

Upload: charo

Post on 04-Jan-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Interactions: Types, Tests and Dangers. By Amy Wagaman. Motivation. When trying to find the “right” treatment for a patient, researchers want to know if “treatment effects are homogeneous over various subsets of patients defined by prognostic factors.” (Gail and Simon 1985: 361). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Interactions:  Types, Tests and Dangers

Interactions: Types, Tests and Dangers

By Amy Wagaman

Page 2: Interactions:  Types, Tests and Dangers

Motivation

When trying to find the “right” treatment for a patient, researchers want to know if “treatment effects are homogeneous over various subsets of patients defined by prognostic factors.” (Gail and Simon 1985: 361).

So, the logical thing to do is to investigate potential interactions.

Page 3: Interactions:  Types, Tests and Dangers

Types of Interactions

Qualitative Interaction: the direction of true treatment differences varies among subsets of patients – also called crossover interaction

Quantitative Interaction: variation in the magnitude but NOT direction of treatment effects among patient subgroups – also called a non-crossover interaction

Page 4: Interactions:  Types, Tests and Dangers

Illustration of Interactions

1

2

0,0 21

0,0 21

This example is for 2 subgroups, say men and women. The yellow regions are regions of qualitative interaction.

The deltas are true treatment effects/efficacy by subgroup.

Page 5: Interactions:  Types, Tests and Dangers

Why Qualitative Interactions?

Qualitative interactions illustrate that a treatment is harmful for one subgroup but beneficial for another. This is very useful information when deciding on what treatment to assign a particular patient.

The problem comes in identifying qualitative interactions.

Page 6: Interactions:  Types, Tests and Dangers

Continued

Qualitative interactions are less likely to exist than quantitative interactions.

The presence of qualitative interactions is not often found in similar trials.

“We regard observed qualitative interactions with skepticism for they are often shown to be spurious when the same comparison is made in similar trials.”

(Yusuf 1991: 94)

Page 7: Interactions:  Types, Tests and Dangers

Why not Quantitative Interactions?

If a treatment is effective (significant positive treatment effect) for all subgroups, but some benefit perhaps more than others, a clinician will still prescribe that treatment for everyone.

Thus, it is argued that little attention needs to be paid to this type of interaction.

Page 8: Interactions:  Types, Tests and Dangers

Continued

“Quantitative interactions are to be expected, but may not be important clinically.” (Gail and Simon 1985: 362).

“I am almost certain a priori that a quantitative interaction will exist between a treatment and any categorization of patients which subdivides them into groups with materially different survival expectancy.” (Peto 1995: 1043).

Page 9: Interactions:  Types, Tests and Dangers

Qual. Versus Quan.

“In summary, quantitative interactions are a priori very plausible, but qualitative interactions are not and, when the overall treatment effects are not overwhelming, trials can be expected to generate a number of apparent qualitative interactions even if no interactions at all exist.” (Peto 1995: 1043).

Page 10: Interactions:  Types, Tests and Dangers

A Testing Hurdle

“All standard statistical tests for interaction are tests for quantitative interaction and significant results in them do not constitute any kind of evidence for the existence of qualitative interactions, unless in addition there were strong prior scientific reasons for anticipating qualitative interactions.”

(Peto 1995: 1043)

Page 11: Interactions:  Types, Tests and Dangers

A Test for Qual. Interactions

Gail and Simon in 1985 developed a LRT for qualitative interactions.

This procedure is often used as the test for qualitative interactions.

However, it has several assumptions: The subsets/subgroups ought to be disjoint The subgroups must be specified in advance

“Unless such a prespecification is made, it is unlikely that sufficient numbers of patients will be available in all subsets for a meaningful assessment of interactions.” (Gail and Simon 1985: 366)

Page 12: Interactions:  Types, Tests and Dangers

Issues of Statistical Power Based on work by Cohen, the estimated N

under optimal study conditions was 128 to have 80% power to detect a medium-sized interaction. For a small-sized interaction, the required sample size is 780.

A review was done to examine 55 studies that tested for interactions.

Only 18 out of the 55 and then 3 out of the 55 studies had large enough samples to have 80% power respectively for each setting. (Moyer

2001)

Page 13: Interactions:  Types, Tests and Dangers

Another Statistical Issue It so happens (see later slides) that

people often perform MANY tests for interaction for any given study.

This helps fuel the suspicion that in many cases, researchers are finding spurious interactions – they are capitalizing on type 1 error.

Extreme example: If you ran 567 tests for interaction, you’d CERTAINLY be expected to find at least one significant interaction.

Page 14: Interactions:  Types, Tests and Dangers

Problems with Subgroups

Very few studies are hypothesis-driven with prespecified subgroups where a potential interaction would make sense.

If you use the data to “help” identify subgroups across which to look for an interaction, you’re getting into somewhat “fishy” territory. Why wouldn’t you expect an interaction across such subgroups?

Page 15: Interactions:  Types, Tests and Dangers

Subgroup Definitions

Proper subgroup: “a group of patients characterized by a common set of ‘baseline’ parameters”

Improper subgroup: “a group of patients characterized by a variable measured after randomization and potentially affected by treatment”

(Yusuf 1991: 93)

Page 16: Interactions:  Types, Tests and Dangers

Another Subgroup Problem

It can be VERY misleading to look for interactions among improper subgroups.

This is because a treatment effect may have contributed to assignment to a subgroup.

Page 17: Interactions:  Types, Tests and Dangers

Misinterpretations

Two types of misinterpretations Misinterpretation of significant

interactions Misinterpretation of non-significant but

surprising interactions

Page 18: Interactions:  Types, Tests and Dangers

Example of Abuse of Test In an analysis of data from the Beta-

Blocker Heart Attack Trial, researchers tested for an interaction (using the Gail-Simon test) between “dominant” and “divergent” centers. There were 31 centers (21 dominant, 10 divergent). Dominant means mortality rate higher for

placebo. Divergent means mortality rate higher for

the treatment – propranolol. Note that the subgroups were chosen

using a study outcome. (Horwitz 1996)

Page 19: Interactions:  Types, Tests and Dangers

Ensuing Discussion Senn and Harrell point out the “error” in

the Horwitz paper. “The ‘significant’ result … says

absolutely nothing about the trial in question and everything about the practice of defining groups on the basis of extreme values after the results are in.” (Senn and Harrell 1997: 749)

Picking subsets based on an observed event rate differences is a serious violation of statistical assumptions.

Page 20: Interactions:  Types, Tests and Dangers

How Widespread is the Problem?

A review was done to examine 55 studies that tested for interactions.

30 of those 55 studies found at least one significant interaction.

The mean number of tests performed by those 30 studies was 61 tests (median 16, range 3-567).

The mean number performed by the other 25 studies was 21 tests (median 7, range 1-186).

Page 21: Interactions:  Types, Tests and Dangers

Widespread Continued

Only TWO out of those 55 studies met the following criteria: Hypothesis-driven Sufficient statistical power to detect

medium-size interactions Random assignment of patients to

treatments Conducted 10 or fewer tests for interactions

Page 22: Interactions:  Types, Tests and Dangers

Term Clarification The term “risk index” in this context is a

misnomer. Risk indices are used as predictors of

outcomes, or looking for susceptible groups where new treatments are needed.

My work involves deciding between treatments for patients, not predicting an outcome and in that sense, it can be considered that I am looking for a “tailoring variable”. It’s discrimination versus prediction.

Per Danny’s email and discussion with Susan

Page 23: Interactions:  Types, Tests and Dangers

Implications for Tailoring Variables

Looking for tailoring variables involves looking for subgroups of patients with similar characteristics such that the direction of treatment effect differs across the subgroups, so that you would want to assign one treatment to one group and another to a different group. You could also add a timing issue, i.e. when to switch.

Problem: This is essentially looking for qualitative interactions among unprespecified subgroups.

Page 24: Interactions:  Types, Tests and Dangers

Consider Quant. Interactions?

Based on a talk with Danny

Assume the top line is a very intensive and costly treatment, while the middle is a less-intensive/cheaper one, with the bottom being a control group. The y-axis is treatment effect, and the x-axis is some baseline variable.

Page 25: Interactions:  Types, Tests and Dangers

BibliographyGail, M. and R. Simon. Testing for Qualitative Interactions between Treatment

Effects and Patient Subsets. Biometrics. Vol. 41 No. 2 (June 1985): 361-372. 

Green, Sylvan B. Design of Randomized Trials. Epidemiologic Reviews. Vol. 24 No. 1 (2002): 4-11. 

Horwitz, et.al. Can Treatment that is Helpful on Average be Harmful to Some Patients?… Journal of Clinical Epidemiology. Vol. 49 No. 4 (1996): 395-400. 

Moyer, et.al. Can Methodological Features Account for Patient-Treatment Matching Findings in the Alcohol Field? Journal of Studies on Alcohol. Vol. 62 Issue 1 (Jan. 2001): 62-82. 

Peto, R. Clinical Trials. In Treatment of Cancer. Editors: Price, Sikora, and Halnan. Chapman and Hall Medical: New York. (1995): 1039-1043. 

Senn, Stephen and Frank Harrell. On Wisdom after the Event. Journal of Clinical Epidemiology. Vol. 50 No. 7 (1997): 749-751. 

Vach, et.al. Neural Networks and Logistic Regression: Part II. Computational Statistics and Data Analysis. Vol. 21 (1996): 683-701. 

Yusuf, et.al. Analysis and Interpretation of Treatment Effects in Subgroups of Patients in Randomized Clinical Trials. Journal of the American Medical Association. Vol. 266 No. 1 (1991): 93-98.