stata logistic regression nomogram generator · 2015-06-24 · notice • nomogram generators for...

53
Stata logistic regression nomogram generator Alexander Zlotnik, Telecom.Eng. Víctor Abraira Santos, PhD Ramón y Cajal University Hospital WORK IN PROGRESS PAPER

Upload: others

Post on 12-May-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Stata logistic regressionnomogram generator

Alexander Zlotnik, Telecom.Eng.Víctor Abraira Santos, PhD

Ramón y Cajal University Hospital

WORK IN PROGRESS PAPER

Page 2: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

NOTICE

• Nomogram generators for logistic and Coxregression models have been updated since this presentation.

• Download links to the latest program versions (nomolog & nomocox), examples, tutorialsand methodological notes are available on this webpage:http://www.zlotnik.net/stata/nomograms/

Page 3: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 4: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Introduction

• Nomograms are one of the simplest, easiest and cheapestmethods of mechanical calculus. (…) precision is similar to that of a logarithmic ruler (…). Nomogramscan be used for research purposes (…) sometimes leading to new scientific results.

Source: “Nomography and its applications”G.S.Jovanovsky, Ed. Nauka, 1977

Page 5: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Introduction

• Examples

Page 6: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 7: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Logistic regression predictive models

• Logistic regression‐based predictive models are used in many fields, clinical research being one of them.

• Problems:– Variable importance is not obvious for some clinicians.

– Calculating an output probability with a set of input variable values can be laborious for these models, which hinders their adoption.

Page 8: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Logistic regression predictive models

Source: YanSun, 2011

Page 9: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

• A nomogram could make this calculation much easier. Ex: Kattan nomograms for prostate cancer.

Logistic regression nomograms

Source: prostate-cancer.org

Page 10: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

• Output probability calculations are much easier.

• Variable importance is clear at a glance (longer the line => more important variable).

Logistic regression nomograms

Page 11: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Logistic regression nomograms

• Logistic regression nomogram generation– Plot all possible scores/points (α1 xi) for each variable (X1..N).

– Get constant (α0).

– Transform into probability of event given the formula

)( 011

TPep +−+= α

Total points = TP = α1 X1+ α2 X2+…

Logistic regression nomograms

Page 12: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Logistic regression nomograms

• Nomogram usage for a given case– Get input variable values, i.e. X1 =x1 , X2 =x2 …

– Obtain scores for all variables.

– Add all scores.

– Get probability (on a scale adjusted by the constant α0).

Page 13: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Risk of death for a 40 year old woman, LOS (length of stay) = 20 days, Dialysis time = 75 months

Dialysis time

Page 14: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Risk of death for a 40 year old woman, LOS (length of stay) = 20 days, Dialysis time = 75 months

40 years old => ~ 2.6 pointswoman => ~ 0 pointsLOS = 20 days => ~ 0.3 pointsDialisis time = 75 m => ~ 0.5 points--------------------------Total score ≈ 3.4Probability ≈ 0.05 ≈ 5%

Dialysis time

Page 15: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 16: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Objectives

• Build a general‐purpose nomogram generator made entirely in Stata without external module dependence. Executable after arbitrary “logistic” or “logit” Stata commands.

• Automatic (or imposed) variable and data labeling.

• Automatic (or imposed) variable min/max, divisions, variable labels, dummy data labels.

Page 17: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 18: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Stata programming Gotchas

• Macro string variables limited to 244 chars.

Solution: strL? Stata 13?

• Hard‐to‐grasp macro nesting syntax.

• Lack of built‐in data structures (non‐numeric arrays, dictionaries, lists, etc).

Page 19: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

• Lack of decent debugger (set trace on/off is not enough).

• Steep learning curve for error interpretation.

• Unit testing is hard to implement.

Stata programming Gotchas

Page 20: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 21: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Programming techniques

• Time series graph hacks.

• logit & logistic outputs.

• Variable & data labels.

• Macro nesting.

Page 22: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Programming techniques

• Time series graph hacks

Actual data point

Fictitious data points

myTime

Page 23: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Programming techniques

• Logit & logistic outputs

Page 24: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Programming techniques

• Custom data structures

Page 25: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Programming techniques

• Variable & data labels

Page 26: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Programming techniques

• Macro nesting

Page 27: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 28: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Execution example

Page 29: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Variablelabels are used if defined

Variablelabels are used if defined

Page 30: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •
Page 31: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Positive coefficients

• Usually, positive coefficients are required in these nomograms in order to ease calculations (no substractions to get total score).

• Due to the linear nature of the TP term, it is easy to make all coefficients positive.

)( 011

TPep +−+= α

TP = α1 X1+ α2 X2+…

Page 32: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Positive coefficients

• This can be done manually.

• Let’s see an example…logit muerto edadr ib1.Gtrata tpodial ib2.hbsagdon

Page 33: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

(Gtrata)

(hbsagdon)

Some dummy coefficients are negative

Page 34: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Positive coefficients

• The most negative coefficient in Gtrata is “‐(3) Tacro”, i.e. data value 3.

• Hence, instead oflogit muerto edadr ib1.Gtrata tpodial ib2.hbsagdon

we use…logit muerto edadr ib3.Gtrata tpodial ib2.hbsagdon

Page 35: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Now all dummy coefficients are positive(Gtrata)

(hbsagdon)

Page 36: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

• The program also can perform this operation automatically (work in progress).

• Since coefficients are linearly related, they do not need to be recalculated.

Positive coefficients

Page 37: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Results

• Supports any kind of variable ordering.

• Supports negative coefficients.

• Supports omitted variables due to collinearity.

• Works after an almost arbitrary regression command.

• Let’s see execution modes & parameters…

Page 38: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Remember the custom data structure

Page 39: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Execution modes

• Automatic: everything is determined automatically.

• Manual: define all parameters manually (laborious).

• Hybrid: get some stuff automatically and refine it manually. For example: variable range is 14 to 81 and we want to make it 10 to 100 leaving all other variables as they are.

Page 40: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Hybrid mode

The “edadr” coefficient does not change

Page 41: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

• Since the output is a standard Stata graph, it may be edited for further adjustments.

By the way…

Page 42: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Other execution options

• iMaxVarLabelLen = 30 //Max N of chars to display in variable labels

• iMaxDataLabelLen = 30 //Max N of chars to display in data value labels

• iVarLabDescr = 0//Use variable description as variable label when possible (0=no; 1=yes) 

• iDummyLabWithValues = 1 //Show data values on dummy data value labels (0=no; 1=yes)

• iCoefForcePositive = 0 //Force positive coefs (0=no; 1=yes)

Page 43: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 44: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Limitations

• Dummy syntax must be “bx.var”.

• No interaction operators (“#”, “##”) allowed.

• Maximum of 15 variables (xtline command options overflow). Suggestions?

• Categorical variables have a limit of with 40 dummys each (easy to supersede if needed).

• Several performance improvements possible (max. string length). 

Page 45: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Structure of Presentation

• Introduction• Logistic regression nomograms• Objectives• Stata programming Gotchas• Programming techniques• Results• Limitations• Future work

Page 46: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Future work

• Overcoming Stata limits: string variables, xtline command.

• Better drawing adjustments (fonts, axis, etc).

• Interaction operators support.

• Cox regression nomograms.

Page 47: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Questions?

Page 48: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Backup slides

Page 49: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Dummy coefficient re‐adjustment

• Given a categorical variable “A” with N cátegoriesand a regression constant α0

z = α0 + αA1 ∙ D1 + αA2 ∙ D2 + … + αAN ∙ DN

If αAi i=1..N< 0,  

we set as reference the most negative coefficienti.e. min(αAi i=1..N 

)

Page 50: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Dummy coefficient re‐adjustment

)( 011

TPep +−+= α

TP = αA1 · D1 + αA2 · D2 + …

Page 51: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Dummy coefficient re‐adjustment

• And then

z = β0 + βA1 ∙D1 + βA2∙D2 + … + βAN∙DN

whereβ0 = α0 ‐min(αAi i=1..N 

)β1 = α1 ‐min(αAi i=1..N 

)…βN = αN ‐min(αAi i=1..N 

)

Page 52: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

Some dummy coefficients are negative

Normal execution

Page 53: Stata logistic regression nomogram generator · 2015-06-24 · NOTICE • Nomogram generators for logistic and Cox regression models have been updated since this presentation. •

All coefficients are forced to be positive

This causes a displacement in the Total score to Prob conversion (due to α0)

Execution with forced positive coefficients