slide #1 - mbsw€¦ · 25.05.2019 · slide #1 presentation title and speakerinfo… abstract:...

Slide #1 Presentation Title and Speaker Info…p

Abstract: The LocalControlStrategy R‐package (ver 1.3.2, Jan 2019) implements unsupervised, nonparametric inferences based upon minimal, realistic assumptions about the properties of cross‐sectional data and stresses visualization of treatment effect‐size distributions. We illustrate some new features of these functions using case‐study examples of RWE studies.

**** Notes Below are for Slide #3 ****

Unknown/unobserved u ‐ variables can have causal effects on outcomes, y.

The “Proxy Caveat” is that observed x‐variables may be surrogate measures of “causal” u ‐i blvariables.

In statistical/econometric models, existence of u variables (as well as uncertainty in measuring y, t, e and x variables) necessitate inclusion of "error terms."

Frank Harrell (RMS 2001): “Using the data to guide the data analysis is almost as dangerous as not doing so.” gQ: Which of the above variables are Least Dangerous to “use” in Analysis (e.g. Modeling) Strategies?A: The X‐confounders ! …and only they are needed to determine which patients are most comparable (at baseline.)

Outline…1. Five Introductory Slidesy

2. Binary Treatment Choices

3. Continuous Exposures

4. Three more Case Studies

5. More LC Strategy …time permitting

Nomenclaturey = observed outcome variable(s)y ( )

x = observed pretreatment covariate(s) / confounder(s)

t,e = observed Treatment binary-choice or level of Exposurechoice or level of Exposure

u = unobserved or unknown explanatory variable(s)

The Quote is from: Frank Harrell, Regression Modeling Strategies, Springer (2001).O ll i lik d li i S i d i O f h iNOTE: All Regression‐like modeling is Supervised Learning. Our focus here is on

Unsupervised Learning.

U – vars are Unknown / Unmeasured and, thus, cannot be actually used this way. Any observed U‐surrogates are X’s.

Using either Y or Treatment / Exposure is a Major NoNo according to Don Rubin, “For objective causal inference design trumps analysis” Ann Appl Stat (2008) 808 840For objective causal inference, design trumps analysis , Ann Appl Stat (2008), 808–840.

Anyway, the clearly “correct” answer to our first question is: The X‐variables are Least Dangerous. After all, the true Propensity Score is the (usually unknown) function Pr( t=1 | X ) f(X) [Rosenbaum & Rubin, Bka (1984)].

Preview of Coming Attractions: When is “using” X Most DANGEROUS ?************************** Answer: When the observed Xs are actually IGNORABLE ! Answer: When the observed Xs are actually IGNORABLE !

Material for Further Discussion: …after the terms used below have been defined!When our proposed “Test” suggests Xs are NOT ignorable… What does our unsupervised, nonparametric Pre‐Processing provide?Answer: A distribution of Local Treatment (or Exposure) Effect‐Size (LTDs or LRCs) Estimates that are “expected” to be at least partially Predictable from X (“Fixed” Effects.)

Slide #4 “Comparisons” here are based upon scalar-valued statistics quantifying t t t/ diff i ti ithi b f i t ltreatment/exposure differences or associations within subgroups of experimental units (patients), and are assumed to be Estimable. “Relevance” is also an unspecified and somewhat vague concept here.

What nuances of the two possibilities above make them… Different ? and/or Important ?

The first statement (BLUE) hints at formation of plus focus on patient subgroupsThe first statement (BLUE) hints at formation of plus focus on patient subgroups (usually of size greater than TEN) using their pre-existing characteristics. The Xs must NOT be Ignorable (must be True Confounders) in this case.

The second statement (RED) sounds like a feature of completely randomizedexperimentation …where even unmeasured confounders (let-alone measured confounders) supposedly have no influence on expected Y-outcomes. One way for this to be the case is for all Xs to be Ignorable. Treatment then has only a fixedthis to be the case is for all Xs to be Ignorable. Treatment then has only a fixed Main-Effect (while Exposure has a fixed Association), with patient Y-Outcomes otherwise purely random (unpredictable.)

******************** When analyzing observational data, researchers are typically worried about the problem commonly called [a] treatment selection bias, [b] channeling bias, [c] confounding by indication or [d] low common support. Are such analysts pre-disposed towards thinking that the pretreatment x-characteristics of individual experimental units (e.g. patients) are Not Ignorable (i.e. yield “Apples-to-Oranges” comparisons)?

Personalized Medicine (Tailored Therapeutics) usually involves much more than just “appropriate dose depends upon weight”.

Slide #5 WRONG (mis‐specified) models produce Biased Estimates …where the Bias is i ll Otypically UNKNOWN.

Unsupervised, Nonparametric methods making only minimal, realistic assumptions may still correspond to underlying models that are technically not “correct” (exact). However, their estimates do tend to be much more “data driven” than “model driven” when “fitting” is LOCAL rather than GLOBAL.

Sources of Heteroscedasticity:Sources of Heteroscedasticity:* Differences in Measurement Error between Treatment Cohorts …Sigma(treated) different from Sigma(control) …RARE ?* Variation in Subgroup (Cluster) Sizes tends to be Unavoidable. Arbitrary deletion of patients reduces power (can increase bias).* Deviation in Fraction Treated (propensity) from 1:1 within local subgroups definitely are Unavoidable and Unfortunate.

Cluster Weighs (proportional to cluster Size) appear to be both essential and adequate (a “first‐order” correction) for avoiding Major BIAS in estimation of Local Effect‐Size Distributions. This weighting is easily implemented simply by assigning the estimated treatment or exposure effect‐size to every experimental unit within its cluster …i.e. creation of “tied” numerical values.

Additional research is badly needed look for better ways to “handle” heteroscedasticity in Local Associations and ATE Estimates.

Slide #6 Outline of the Four Phases of analysis within each full cycle of application of Local Control Strategy.

Each “phase” has a relatively esoteric Name and involves application of diverse analytic tactics and concepts.

Three Primary LC Parameter “Settings”:Three Primary LC Parameter “Settings”:

(a) K = Number of Clusters = Blocks of experimental units in X‐space,

(b) Number and selection of X‐confounders used in covariate adjustment, and

(c) Choice of experimental unit dissimilarity X‐metric and aggregation algorithm.

NOTE: This slide could be used as a list of Learning Objectives for a training workshop on LC Strategy.

Slide #7 This “radon” dataset is included within my LocalControlStrategy R‐package…just “install” the package from CRAN and enter the following 3 commands:

require(LocalControlStrategy) # or library(LocalControlStrategy) data(radon) str(radon) # 2,881 observations on 11 variables

help(radon) # provides on‐line documentation for this data frame!help(radon) # provides on line documentation for this data.frame!

This graphic “suggests” that higher Radon Exposure levels tend to yield lower Lung Cancer Mortality rates (deaths per 100,000 resident‐years.) The Gray fitted line is the Ordinary Least Squares fit, while the Dashed‐Red curve is the R default smooth.spline( ) fit.

Indoor Radon levels are reported in this dataset to only 1 place after the decimal (nearest 0.1 pCi/L.)

Inferences based only on the data plotted here on Slide #7 are unadjusted for local %over65, %cursmoke and/or %obesity. We will see that all 3 of these x‐covariates are important “effect modifiers” …i.e. they are NOT “ignorable”.

NOTE: The R function for natural‐logarithms is log(). Base 10 logarithms are given byNOTE: The R function for natural logarithms is log(). Base 10 logarithms are given by log10().

Slide #8 Transition: Introduction to the Aggregate and Confirm steps of Local Control (LC) strategy…

Some (clinical?) researchers may well initially react to the first sentence on Slide #8 byclaiming an INSTANT (one‐line) proof of statistical invalidity. After all, these data are “Observational”. It’s quite obvious that U.S. counties were NOT Randomly Assigned to their observed T/E levels.

Unfortunately, Randomization of experimental units (patients, US counties, etc.) does NOT Assure that X‐confounders will or wont be Ignorable!!!

The “evidence” referred to in the final sentence of Slide #8 comes in one of 2 “flavors” …depending upon whether the Treatment or Exposure measure has either “only 2” or (many) “more than 2” levels.

The available X‐confounders that are Not Ignorable here are: % residents over 65, % residents who currently smoke, and % of obese residents.

Slide #9 Using a “Split-Point” to convert a continuous Exposure measure into a binaryT t t i di tTreatment indicator:

WARNING: Any split-point used to define a pair of (binary) Treatment Choices should definitely be pre-specified …rather than chosen by “LOOKING” at BOTH actual, observedY-outcome and E-xposure data (i.e. using supervised learning as an aid to determination of T-effects to be compared.) For example, the EPA radon mitigation threshold is set at radon >= 4.0 pCi/L; creating a split with LogWorth of “only” 75.4 rather than 153.8. The two node counts would be (2147 low, 734 high) rather than (1661 l 1220 hi ht) d ti(1661 low, 1220 hight) radon counties.

The above sort of “Look-Ahead” could introduce Bias into datasets much smaller than this one. We will be able to literally SEE effects of using LTDs rather than LRCs on the “Confirm” graphics in Slides #18 and #19. After all, the above LogWorth > 153 means that our p-Value has more than 150 zeros (rather than “only” about 75) after the decimal point …before its first non-zero digit!

I th d ti ti th lit i t (2 6 Ci/L) i R i P titi i (fitt dIn other words, estimating the split-point (2.6 pCi/L) using Recursive Partitioning (fitted “Tree” Model) is definitely cheating!

*******************************The RP Objective here is to predict Y = Lung Cancer Mortality per 100,000 population (all ages) via splits on E = Indoor Radon level.

Partition of (unadjusted) Lung Cancer Mortality on High vs Low Radon ExposurePartition of (unadjusted) Lung Cancer Mortality on High vs. Low Radon Exposure

Rsquare RMSE N Number of Splits AICc0.157 16.200826 2881 1 24229.5

Slide #10 Definition LTD effect‐size estimates…

Clusters may be “thought of” as forming Blocks, Strata, Matched‐Sets or specified Subgroups of experimental units.

LTD estimates are Local (within‐cluster) ATEs deliberately designed to be Unbiased.

The overall Average Treatment Effect (ATE) is essentially meaningless when the units underThe overall Average Treatment Effect (ATE) is essentially meaningless when the units under study are Numerous plus Diverse–and‐Unbalanced in x‐space. The so‐called LATEestimates used in Econometrics are defined in ways that strike me as more “mysterious” than “clear”. In any case, I have always assumed that particular “name” was already “taken.” My 2014 paper with Ken Lopiano and Stan Young discusses a special case (exact x‐matches) where the Average “weighted” LTD is called a FATE estimate.

LTDs are relatively meaningful primarily because Clusters (BLOCKs) can be made to besmall (local) and to contain only experimental units relatively well‐matched in x‐space.

LC Strategy: What we are getting ready to do here is to move “local” treatment effects to the left‐hand‐side of any EVENTUAL modeling equations needed for inferences about causality. With nonparametric LTD estimates, we are essentially creating NEW DATA (values for a left‐hand‐side variable) that aremore relevant to making objective and meaningfulfor a left hand side variable) that are more relevant to making objective and meaningful treatment comparisons.

Slide #11 Nested ANOVA: Treatment (0:1) within Cluster and Instrumental Variable Inferences

McClellan, McNeil, Newhouse (1994) and many economists have studied “instrumental variable” approaches. The key assumption is that observed X‐covariates determine only treatment selection and do NOT influence outcome, Y, except through treatment choice.

MM&N proposed that cluster means be plotted (Slide #12) vertically against a horizontal axis depicting within‐cluster fraction treated (observed propensity for treatment.) This approach uses information only from the “Clusters” row of the ANOVA table. MM&N further contended that trends (up or down) in the displayed values from left‐to‐right across this plot are interpretable as causal effects when all X‐variables used to form patient clusters are instrumental variables.

The Local Control approach uses information only from the “Treatment within Cluster” row of the ANOVA table and yields an “observed LTD” distribution of effect‐size estimates (see Slide #15). Validity of an LTD distribution NEITHER requires X‐variables to be “instruments” NOR makes any other sorts of un‐testable assumptions.

Slide #12 Figure from McClellan, McNeil & Newhouse ‐ JAMA. 1994 Sep 21;272(11):859‐66.

Information from the “Clusters” row of the Nested ANOVA model of Slide #11 is used by assuming X‐confounders are IVs. The fitted line shown above suggests a down‐ward trend in 2‐year Mortality as Propensity for Invasive Treatment increases. The fit is a “weighted” linear regression and provides a slope (main‐effect) estimate (1 degree‐of‐freedom) across K clusters (of “sizes” = N1, N2, …, NK.)

N 205K ld l AMI ti t f h “di t f th h it l” f d i i ld bN = 205K elderly AMI patients for whom “distance from the hospital” of admission could be computed from ZIP code information.

Note that “distance from the hospital” is a plausible instrument here because patients who live in a big city near to a big hospital are more likely to receive an (expensive) invasive procedure (whether they really need it or not.) Thus “distance from the hospital” should be predictive of choice of treatment without being predictive of outcome (except through treatment.)

Patient subgroups were initially formed by dividing patients into 15 “distance from the hospital” bands. About 120 = 15 x 2 x 2 x 2 subgroups were then formed (as in Coarsened Exact Matching=CEM) on distance, sex, race and (elderly) age ranges.

See also: Stukel TA, Fisher ES, Wennberg DE, et al. [AMI survival using Propensity Scores and IVs.] JAMA 2007; 297, 278-285 andAustin PC Mamdani MM Stukel TA et al [Propensity Scores: admin vs clin data] StatAustin PC, Mamdani MM, Stukel TA, et al. [Propensity Scores: admin vs clin data] Stat Med. 2005; 24: 1563-1578.

Slide #13 Nested ANOVA Calculations in R …and heteroscedasticity of LTD estimates

The LCcluster() function within the LocalControlStrategy package (2019) computes a clustering tree (dendrogram) using R-function stat::hclust( ). This computation uses Mahalanobis distances in x-space and one of 8 clustering algorithms (default = “ward.D”).

The ltdagg(K, envir) or lrcagg(K, envir) function for LocalControlStrategy then determines (given K = Number of Clusters) the C-vector giving the cluster-membership numbers for individual experimental units.

The R-function stat::lm() for fitting fixed-effect linear models can be invoked as shown above …using the “formula” interface: The Y term is the left-hand-side (independent) variable.The -1 term signals that the model is to have “no intercept.”The C “factor” (category) estimate commonly differs from the within-cluster Y-mean …due to “imbalance” in the overall “design”.

L kil th T ithi C t i d d t d ithi l t LTD ti t dLuckily, the T-within-C terms are indeed computed as within-cluster LTD estimates and(heteroscedastic) Std. Errors. Specifically, the summary(fit) table provides a listing of 4 statistics for each of 2*K effects: Estimate, Std. Error, t value, & Pr(>|t|).

Any LTD (T %in% C) effect-sizes that are not estimable (i.e. from “uninformative” clusters) are replaced by 4 “NA” values in the corresponding row of the summary(fit) table.

WARNING: Again, the C estimates and their Std. Errors [first K rows of summary(fit)] are ll Cl Y M husually not Cluster Y-Means here.

Slide #14 IV plot from LocalControlStrategy::ivadj() using K Clusters in X‐covariate space.

In this MM&N‐style graphic, clusters with higher Observed Proportions (true Propensity) of High Radon Level counties clearly tend to have lower Local Average Outcomes (LAO) on Lung Cancer Mortality. Again, this observation has “causal implications” only when % residents over 65, % residents who currently smoke, and % obese residents are Instrumental VariablesInstrumental Variables.

The Dashed Red line is the Ordinary Least Squares fit, while the Dashed Blue curve is a the default smooth.spline() fit.

*********************Rather than use the ivadj() function from LocalControlStrategy, C effect‐sizes (Cluster Y‐means) could be computed from a “reduced model” using the stats::lm( ) function …similar to Slide #13:

fitC = lm(Y ~ ‐1 + C); summary(fitC)

My main concern about using this “reduced model” is that its df(error) = 2831 are increased by “ignoring” the 50 T %in% C terms, yet its Residual Mean Square is alsoincreased by ignoring the 50 T %in% C terms, yet its Residual Mean Square is also increased here from 13.3 to 13.9 (mortality rate death units).

Slide #15 Depicting a LTD effect‐size Distribution…

This sort of Histogram is the default R‐graphic from the command: plot(ltdagg(K, envir))

For example, an LTD estimate of ‐20 here means that High Radon exposure (at least 2.6 pCi/L) leads to 20 fewer deaths (per 100,000 resident‐years) than Low exposure (less than 2.6 pCi/L).

The “band width” (bin size) used in a histogram display may bias their interpretation bystatistical novices.

Unfortunately, alternative displays such as I‐plots (box‐and‐whisker diagrams) and empirical Cumulative Distribution Functions (eCDFs) tend to be much less familiar (more esoteric) to novices.

Slide #16 Local Rank Correlations: exposure effect-size estimands and estimates

Clusters are again used here as “strata”, “blocks” or “matched-sets” of experimental units (patients, U.S. counties, etc.)

We will ultimately use LRCs as “local” exposure effect-size estimates on the left-hand-side of modeling equations needed for inferences about causality. Since LRCs are non-parametric estimates we are using “robust” and “local” effect-sizeLRCs are non-parametric estimates, we are using robust and local effect-size measures that provide more objective and relevant exposure comparisons than could be possible using only individual observed Y-outcomes.

LRCs are computed in R using cor(y, e, method = "spearman"), where the y and e (vectors) are for units within a single cluster.

The absolute value of a Local Spearman (or Pearson) Correlation can be interpreted as a measure of the “goodness of fit” of a “best” straight line to within-cluster (Y-outcome, Exposure) data pairs. The correlation coefficient itself is the slope of this “best” line.

The really good news about using LRC (rather than LTD) estimates is that their heteroscedasticity then varies ONLY with Cluster Size.heteroscedasticity then varies ONLY with Cluster Size.

Slide #17 A “Sensitivity Analysis” of Variance‐Bias trade‐offs in estimation of LRC y ydistributions (3 “SA” sides after references) convinced us to use K = 50 "Ward" clusters. The above graphic displays the resulting LRC distribution in a histogram with 14 non‐empty "bins." Each bin has width 0.05 and its height "counts" the number of U.S. counties with an LRC estimate falling within that bin.

While correlations can range from 1.0 to +1.0, we see that our 50 observed LRCestimates range here only between 0.70 and +0.10. In fact, more than half (1,624) of the N = 2,881 U.S. counties in the available data are members of clusters with LRCs in the five histogram bins between 0.45 and 0.20.

The overall mean LRC = ‐0.322 is denoted by the red vertical line within the modal bin, (‐0.35, ‐0.30].

The blue vertical line at LRC = 0 shows that only two bins (containing 90 of 2,881 counties) have positive LRC estimates.

Supplemental (Un‐Numbered) Slide

Clusters are truly “meaningful” when X‐confounders are NOT Ignorable and each cluster contains only experimental units that are exact matches in X‐space.

Clusters formed using observational data are still “informative” when they contain experimental units that provide “more‐relevant” and “most‐relevant” treatment comparisons (or exposure associations) and exclude units that would yield “less” orcomparisons (or exposure associations) and exclude units that would yield less or “least” relevant comparisons or associations. E.G., T=1 effects on elderly ladies who don’t smoke should not be compared with T=0 effects on young male smokers!

While the number of Unique Permutations is N! (i.e. N‐factorial), the number of “Potentially” Unique LTD/LRC Distributions is given by the Combinatorial Coefficient = N! divided by the product of K factorial‐terms of the form: (Cluster Size)!

While this number can be much smaller than (N!), it is still a truly gigantic number when K << N/3 and N is over, say, 1,000.

The actual number of truly unique (LTD/LRC) distributions can be further reduced by “exact ties” in LTD/LRC estimates within different clusters. [See Slide #23 for an example with many ties at LTD=0 for a binary (0,1) Y‐Outcome.]with many ties at LTD 0 for a binary (0,1) Y Outcome.]

Slide #18 LC “Confirm” Calculations

The LC “Confirm” step data.frame is essentially a “rectangular file” with 2,881 Rows forthe Radon example.

The K clusters are to be Mutually Exclusive and Exhaustive; every unit is to be in 1 and only 1 cluster. Thus the “size” of Cluster “j” is the number of rows where Variable #3 equals “j” Note that both the number of clusters (K) and the distribution of cluster‐sizes alwaysj . Note that both the number of clusters (K) and the distribution of cluster sizes always remain the same for every possible permutation. Different permutations that yield the same combination of (y, e) pairs within a cluster will yield the same Local Effect‐Size point‐estimate for that cluster.

Slide #19 The Local Effect‐Size Measures used here are: LRCs (Rank Correlations).

The NULL distribution of the D‐statistic (maximum vertical separation between eCDFs) for the Kolmogorov‐Smirnov two‐sample test is KNOWN only when both distributions being compared are Absolutely Continuous. Thus the stats::ks.test() R‐function issues many warning messages about “ties”. These warnings are “suppressed” inside the LocalControlStrategy functions confirm() & KSperm().

When using K=50 clusters and reps=1000, the total run‐time for this LRC example can be as high as 14 minutes. The calculations yield an observed D‐statistic of 0.4539 and estimated (maximum) p‐Value of 0.001. Within the right‐hand plot here on Slide #19, note that the eCDF for NULL LRC estimates reaches 1.000 at D=0.215, a value less that half of the observed D=0.45. In other words, the true p‐Value here h ll b h b h h l hhas actually been shown to be much, much less than 0.001.

NOTE: Since the KSperm( ) default value is reps = 100, the “simulated” p‐Value will always then be at least 0.01. In the current (new) World Beyond “p < 0.05”, a simulated bound of “p < 0.01” may be Good Enough to “conclude” or “decide” that x‐Confounders are Not Ignorable.

Slide #20 The Local Effect‐Size Measures used here are: LTDs (Local ATEs). LTD units are local difference (High‐Low) in deaths per 100,000 resident‐years.

Comparing this slide (#20) with slide #19 (LRCs) provides Evidence of “Cheating” via choice of an Optimal Cut‐Point (2.6 pCi/L):

Specifics: Dividing a continuous measure of Radon exposure into only 2‐levels (High and Low) “should” (and certainly could) entail considerable loss of information The oppositeLow) should (and certainly could) entail considerable loss of information. The oppositeappears to happen here! The observed D‐statistic gets bigger (0.49 > 0.45) while its simulated NULL values become smaller (max NULL D‐value decreases from 0.215 to 0.211 for reps=1000).

NOTE: The same random number generation seed() values were used in both simulations. This is a so‐called “Monte‐Carlo Swindle” that makes the curves on slides #19 and #20 as highly positively correlated and, thus, as highly “comparable” as possible.

Slide #21 Air Pollution Study: p = 0.43 implies “Fail to Reject” Ignorable Confounders

CAAA enforcement greatly decreased TSP measures of air‐pollution for 277 “regulated” Counties …but had essentially no effects on “Change in Longevity” (units = deaths per 10K residents) defined as the Post‐period County Average [Years 1972‐74] minus the Pre‐period County Average [Years 1969‐71] for all 534 Counties. For example, the Main‐Effect of treatment (CAAA enforcement of regulations on Local Pollution‐generating entities) yielded a t‐statistic = ‐1 25 with two‐tailed p‐Value = 0 212a t statistic = 1.25 with two tailed p Value = 0.212.

Data were available for 6 consecutive years (i.e. longitudinally), but we needed to calculate pre‐ and post‐enforcement County Averages to provide cross‐sectional summary statistics. Furthermore, having only N = 534 Counties restricted our attention to K=25 Clusters (Average Size = 21.4) to optimize variance‐bias trade‐offs on LTD estimates [see slides/notes on pages 33‐34.]

BOTTOM LINE: 25 Clusters implies at most 25 “steps” in the eCDF for the observed LTD distribution. While there are 5.4 * 10^+672 possible combinations (among (534)! = 2.3 * 10^+1226 possible permutations) for the random NULL distribution, just 100 permutations are enough to “signal” Low Power for detecting deviations from X‐ignorability here!

Slide #22 Hypothesis of Ignorable x‐Confounders cannot be “Rejected” in this case study!

As always, the NULL is NEVER “accepted.” [Observed p‐Value ~ 0.3]

One Interpretation: Relative Duration of Treatment (Y = # refills, “meet treatment guidelines ?”, etc.) can depend critically upon the relative long‐term Tolerability of the 2 treatments being compared.

Early clinical research on treatment of Major Depressive Disorder (MDD) revealed major “placebo effects”. In other words, Side‐Effect Profiles could be more important than treatment Effectiveness (Efficacy?) for some (many?) patients. Ergo… No wonder that baseline X‐measures appear ignorable in this context.

And there is a Highly Significant MAIN‐EFFECT here: Duration with T=1 is greater than with T=0. Difference = +0.358 refills; 95% confidence interval is (+0.189, +0.528) refills. After all, N > 13K here!

Slide #23 “Plasmode” Simulated data based upon the small (N = 996 patient) Lindner OCER study …but N = 15,487 in the simulated data. This larger data.frame also has a Much largerPercentage (55%) of “Usual PCI Care Alone” (control) patients …only 30% of the 996 patients in the original Lindner OCER did not receive the “blood thinner”.

The data show that survival rates are quite HIGH in both treatment cohorts (99% vs. 96%); this causes 7,115/15,392 => 46.23% of LTD estimates from informative clusters to be Exact Zeros while “only” 617 246/1 548 645 = 39 86% of random NULL LTDs are 0sZeros, while only 617,246/1,548,645 = 39.86% of random NULL LTDs are 0s.

xvars = c("stent","height","female","diabetic","acutemi","ejfract","ves1proc")

Only 100 replications (default value) used in confirm( ) and Ksperm( ); Smallest possible simulated p‐value = 0.01 actually observed!

Case-Study Summary (Unnumbered) Slide

Un-Resolved (on Confounders Ignorable?) ==> Truly Ignorable or relevant Pre-Treatment Covariates Missing / Unavailable (i.e. hypothesis would be falsifiable with “better” data). As always: NULL is never Accepted! …testing can only “Fail to Reject” it.

Publishing results of global Covariate Adjustments using only potentially IgnorablePublishing results of global Covariate Adjustments using only potentially Ignorable X-covariates can be unethical, statistically …especially when the implied fit is poor even though some “terms” may be “significant.”

LC strategy tends to not be very useful with relatively small datasets (like the CAAA case-study, 534 experimental units).

Slide #24 Premise: Units within the same subgroup have the same (shared) X‐characteristic(s) or are as similar as possible. While Units within different subgroups have different X‐characteristic(s) or are as dissimilar as possible.

Subgroups are ideally formed in an “unsupervised” way; i.e. based only upon observable pre‐treatment X‐characteristics of units.

“Matched Sets” typically do not (and should not) require any fixed ratio of “treated” to “control” t‐designations. “Optimal Matching” methods may even use (forbidden) information about y‐realizations as well as t‐choices.

Propensity scoring methods typically use a logistic regression (discrete choice) model to fit “li f ti l” th t FORCES b t d t i li ti STRATA ( ll tia “linear functional” that FORCES subgroups to correspond to simplistic STRATA (collections

of stacked, parallel linear subspaces of Euclidean X‐space.) Propensity scores resulting from Classification Trees (fit by recursive partitioning) would yield subgroups not subject to this linearity restriction …but some resulting subgroups could still have infinite maximal diameters.

A b i id t b “ i f ti b t it l l t t t diff ” (LTD) h itA subgroup is said to be “uninformative about its local treatment difference” (LTD) when it is PURE in the sense that it contains either ONLY t = 0 patients or else ONLY t = 1 patients.

Slide #25 The essence of LC Strategy is to: SEPARATE Treatment / Exposure Effect ESTIMATION from its PREDICTION. Here we outline why this SEPARATION is Helpful.

LC strategy allows ESTIMATION to be Local and Nonparametric …making only minimal assumptions, as in ANOVA models.In other words, local estimates are “robust” in the sense that they do NOT depend upon the observed numerical values of X variables. Local estimates result from groupings of experimental units suggested by Unsupervised Learning techniques in X‐space (clusteringexperimental units suggested by Unsupervised Learning techniques in X space (clustering, matching, density estimation, etc.) These groupings completely ignore all y‐outcome and t‐exposure information.

In stark contrast, PREDICTION usually remains Global and Parametric … this form of traditional Supervised Learning makes STRONG assumptions, as in ANCOVA models. It’s well known that parametric prediction can be arbitrary, difficult and frustrating.

If PREDICTION modeling still has to be done in the Fourth (Reveal) Phase of LC Strategy, why go to all of the trouble of doing the Initial Three Phases of LC? The answer is that LC is a DIVIDE & RECOMBINE Strategy that makes final phase LC analysis relatively easy…almost a “slam dunk.”

Slide #26 Specifically, note that LC Strategy moves T‐effects from the Right‐Hand side of “O S ” d l i h f d id f h i“One Step” Model Equations to the Left‐Hand side of REVEAL Phase Equations.

In traditional One‐Step modeling, Parsimonious Specifications are Counter‐Productivebecause they tend to be “Too Simple” (Lack sufficient, adequate Detail) primarily because their Goodness‐of‐Fit (R‐square) tends to be LOW in predicting y‐outcomes alone. For example, whenever inclusion of Interaction Terms cannot be statistically justified, Treatment can then have only a Main‐Effect …resulting in a One‐Size‐Fits‐All perspective.

Y‐outcomes for individual experimental units (a large number, N) are typically difficult to predict. T‐effects for local subgroups of Y‐outcomes are fewer in number (K << N), are typically much richer in information, and typically much easier to predict from X variation across subgroups.

In the LC Reveal phase, Parsimony is highly desirable as long as Goodness‐of‐Fit remains well above what can traditionally be achieved using One‐Step Models. LC NonprametricPreprocessing has made model‐fitting Bull's‐Eyesmuch BIGGER and FEWER!

The extent to which Local T‐effects can be predicted from confounding X‐covariates is an ordinal measure of (rather than necessarily quantifies) the extent to which effects are Heterogeneous (truly FIXED; objective basis for individualized medicine) rather than purely random (i.e. characterized only, say, by a fixedMain‐Effect and ameasure of spread.)random (i.e. characterized only, say, by a fixed Main Effect and a measure of spread.)

Slide #27 This definition or distinction represents a fundamental dichotomy of basic TYPES of local effect‐sizes.

For example, ethical practice of “individualized medicine” requires existence of heterogeneous treatment effects.

Random effects can still have a (non‐zero) expected value = Main Effect of Treatment. The “better” treatment choice then represents an optimal “one size fits all” prescription forbetter treatment choice then represents an optimal, one size fits all prescription for ethical treatment.

Slide #28 Technically, “Fair” comparisons require EXACT x‐space MATCHES …i.e. without any “coarsening” of x‐confounders that are continuous or have many distinct levels.

The deadly enemy of GOOD statistical‐thinking practices are those considered to be PERFECT / IDEAL …but which can be implemented only rarely in actual practice.

ATE estimands and estimates provide only comparisons that [a] could be UNFAIR (Apples‐to‐Oranges) and [b] always provide ONE‐SIZE‐FITS‐ALL treatment policiesto Oranges) and [b] always provide ONE SIZE FITS ALL treatment policies.

Tukey quote relevant to either: [1] the distinction between FATE(Φ) and ATE estimands, or [2] the advantages of many Local Within-Block estimates of their corresponding Fair Treatment estimands over a single, “one-size fits all” estimate of the possibly-unfair ATE estimand.

"Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise." --John Tukey in: The future of data analysis. Annals of Mathematical Statistics 33 (1), 1962, page 13.

Obenchain RL. “SAS Macros for Local Control: Phases 1 and 2.” Observational Medical Outcomes Partnership (OMOP), FNIH. http://localcontrolstatistics.org/other/LCinSAS 2019.zip, 2009 (Revised 2019). / _ p, ( )

Obenchain RL. “The Local Control approach using JMP.” Analysis of Observational Health Care Data using SAS. DE Faries, AC Leon, J Maria‐Haro and RL Obenchain, eds. Cary, NC: SAS Press. 2010, pp 151–192.

Obenchain RL, Young SS. “Advancing statistical thinking in observational health care research.” Journal of Statistical Theory and Practice. 2013; 7: 456‐469.

Lopiano KK, Obenchain RL, Young SS. “Fair Treatment Comparisons in Observational Research.” Statistical Analysis and Data Mining 2014; 7: 376‐384.

Wolfinger RD, Obenchain RL. JMP® Add‐Ins Module for Local Control. https://community.jmp.com/docs/DOC‐7453. SAS Institute Inc., Cary, NC, 2015.

Obenchain RL, LocalControlStrategy: R‐package for Robust Analysis of Cross‐Sectional Data. Version 1.3.2; 2019‐01‐07. https://CRAN.R‐project.org/package=LocalControlStrategy.

“Radon Study” Collaborators:

• Stanley Young, PhD, FASA; CEO,Stanley Young, PhD, FASA; CEO, CGStat, Raleigh, NC, USA

• Goran Krstic, PhD, RPBio; Human Health Risk Assessment S i li t F H lthSpecialist, Fraser Health Authority, Vancouver, BC, Canada

Patie

n

EMost

Favors T=1

Stage DataREVEAL:

Supervised LearningNonparametric Preprocessing:

Unsupervised Learningnt C

haracte

rist

A

ACE

Most Typical

MostF T 0

R

tics Favors T=0

Systematic Sensitivity Analyses of Alternative

LTD Distributions

Which Observed LTD Distributions are Typical or Extreme?

LTDs PredictableFrom X?

Create Rectangular

File

Both of these Slides suggest using K=50 Clusters for the radon data in the sensethat Variance-Bias trade-offs appear to be optimized at this level of Aggregation.

Slide #33 Comparisons of (smoothed) Observed and Random NULL eCDFs using JMPCl iClustering…

As the number of clusters increases, clusters become smaller and the variance of local comparisons is expected to increase. Four observed eCDFs for clusters of well-matched counties are shown in BLUE; the corresponding NULL eCDFs for randomly matched counties are shown in RED.

It's generally a mistake to use a large number of clusters unless there is clearIt s generally a mistake to use a large number of clusters unless there is clear evidence that additional bias is thereby removed. A variance-bias trade-off is occurring !!!

These four CDF-pairs clearly show that the "apparent" variance (spread) in LTD empirical distributions increases with K; all 8 CDFs tend to become less and less steep in the vicinity of their median values as the number of “requested” clusters increases.increases.

Note also that more and more of the smaller clusters are becoming “uninformative” here (contain only “High Radon” or only “Low Radon” counties). Luckily, the “ward.D” option for stats::hclust in R appears to produce fewer really small clusters than “Fast Ward” in JMP.

In the four pairs of observed and permutation CDFs shown above, the total number of clusters requested are K = 50, 100, 200 and 400, respectively. With this increase in number of clusters requested, the average size of informative clusters declines from roughly 58, to 30, to 16, and finally to only 8 counties …i.e. all data from about 500 U.S. counties is disregarded.

slide #1 - mbsw€¦ · 25.05.2019 · slide #1 presentation title and speakerinfo… abstract:...

Documents