predicting who will die within six months

15
Predicting Who Will Die Presentation by: Matthew Dunning

Upload: matthew-dunning

Post on 06-Apr-2017

12 views

Category:

Healthcare


1 download

TRANSCRIPT

Page 1: Predicting Who Will Die Within Six Months

Predicting Who Will Die

Presentation by: Matthew Dunning

Page 2: Predicting Who Will Die Within Six Months
Page 3: Predicting Who Will Die Within Six Months

Objective of work

• The objective of this work is to identify the probability of death based on the diagnosis, medical history, category of diagnosis and reoccurring diagnoses.

• I used Bayes Theorem to in order to solve for likelihood ratio, probability, sensitivity and specificity.

• This work is important in order to make evidence based decisions on patient ids that need more attention/follow up if they are alive, and if dead, use the data to help understand the effects of these diseases.

Page 4: Predicting Who Will Die Within Six Months

Data & It’s Source• In the original data

source, there are 17443442 rows of data.

• Sorting by count of icd9 desc, you would have a left skewed. Without sort, the distribution would be “all jumbled up” I4

01.9

I305.1

I496

.

I414

.01

I427.3

1

I600.0

0

I285.9

IV60

.0

I403.9

0

I724.2

I303.9

1

I070.5

4

I338.2

9

I585.9

I486.

IV45

.81

I276

.8

I070

.70

I564

.00

I276

.1

IV62

.0

I278.0

0

IV45

.82

IV57

.89

I427.8

90

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

Count of ICD9, Top 50

ICD9 Code

ICD

9 C

ount

Page 5: Predicting Who Will Die Within Six Months

Preparation of the Data

Remove Cases Where Patients

Died Before Visit

17443442 to 17439892

rows of id,icd9

Original Data

Remove cases in a year that

exceed 365

17443442 rows id,

icd9

829801 to 828616 id’s

Final Data to use for analysis

15715093 rows id, icd9

Page 6: Predicting Who Will Die Within Six Months

Randomization

• 80% of data was used for training• 20% of data was used for validation

Page 7: Predicting Who Will Die Within Six Months

Calculation of Likelihood Ratios

LR = _______________________________________DeadwithDx/DeadAlivewithDx/Alive

Page 8: Predicting Who Will Die Within Six Months

Top Ten Deadliest Diseases – Each Dx from Training Set

1. Brain Death2. Malignant Ascites3. Malignant Neoplasm of bilary tract4. Encounter for palliative care5. Cardiac Arrest6. Coma7. Malignant Pleural Effusion8. Secondary Malignant Neoplasm of Adrenal Gland9. Disseminated Malignant Neoplasm without

specification of site10. Secondary Malignant Neoplasm of brain and spinal

cord

icd9 LR

I348.82 11.78

I789.51 5.53

I156.9 5.04

IV66.7 4.95

I427.5 4.85

I780.01 4.75

I511.81 4.62

I198.7 4.45I199.0 4.42

I198.3 4.35

Page 9: Predicting Who Will Die Within Six Months

Least Ten Deadliest Diseases – Each Dx from Training Set

1. Dysmenorrhea

2. Chondromalacia of patella

3. Hypertrophy of tonsils alone

4. Personal History of injury presenting hazards to health

5. Schizophrenic disorders, residual type, chronic with acute exacerbation

6. Unspecified symptom associated with female genital organs

7. Pelvic peritoneal adhesions, female (postoperative)(post infection)

8. Cervicitis and endocervicitis

9. Migraine without aura, without mention of intractable migraine without mention of status migrainosus

10. Amphetamine or related acting sympathomimetic abuse, episodic

icd9 LR

I625.3 0.0009I717.7 0.001

I474.11 0.002

IV15.5 0.002I295.64 0.002

I625.9 0.002

I614.6 0.003

I616.0 0.004

I346.10 0.004

I305.72 0.004

Page 10: Predicting Who Will Die Within Six Months

How medical history can be used to predict prognosis

• Medical history can be used to predict future prognosis because it can be used to look into probabilities of developing another medical condition, developing the medical condition again X amount of times and your chances of death within a certain timespan.

Page 11: Predicting Who Will Die Within Six Months

id icd9 ageatdx ageatdeath odds prob

463305 I529.4 63.5 NULL 9.646010674 0.906068101

463305 I300.00 64.333333 NULL 0.479591875 0.324137949

463305 I528.9 64.333333 NULL 0.441291616 0.306177883

463305 I528.9 60.416666 NULL 0.435537554 0.303396838

463305 I714.0 62.666666 NULL 0.330190365 0.248227903

463305 I300.00 60.083333 NULL 0.29491929 0.227751098

463305 I530.81 66.416666 NULL 0.260475898 0.206648853

463305 I401.9 62.916666 NULL 0.257884965 0.205014745

463305 I300.00 64.083333 NULL 0.255498056 0.203503346

463305 I401.9 60.166666 NULL 0.23220788 0.188448624

463305 I530.81 63 NULL 0.229879244 0.186912045

463305 I311. 64.083333 NULL 0.219384003 0.179913795

463305 IV15.81 62.916666 NULL 0.217714068 0.178789154

463305 I300.01 60.083333 NULL 0.131771237 0.116429215

463305 I278.00 60.166666 NULL 0.109584144 0.098761454

463305 I278.00 66.416666 NULL 0.107392413 0.09697774

463305 I296.7 63 NULL 0.099679257 0.090643936

463305 I296.7 63.583333 NULL 0.098795268 0.089912353

463305 I306.9 63 NULL 0.070926549 0.066229144463305 I306.9 63.583333 NULL 0.06561912 0.0615784

463305 I296.50 63.5 NULL 0.062146048 0.05850989

463305 IV62.4 63 NULL 0.048503436 0.046259683

463305 IV62.0 62.916666 NULL 0.029597396 0.028746573

Page 12: Predicting Who Will Die Within Six Months

00.20.40.60.811.20

0.2

0.4

0.6

0.8

1

1.2

Sensitivity vs Specificity

Specificity

Sens

itivi

ty

Page 13: Predicting Who Will Die Within Six Months

Accuracy of the prediction

• I used the formula: Accuracy = (TN+TP)/(TN+TP+FN+FP)

Probability Sensitivity Specificity Accuracy

0 1 0 50%

0.2 0.738 0.493 62%

0.4 0.3033 0.86 58%

0.8 0.01206 0.997 50%

0.9 0.00001084 0.9999 50%

0.95 4.0669E-06 1 50%

1 0 1 50%

• Sensitivity – among patients with a disease, the probability of a positive test

• Specificity – Among patients without disease, the probability of a negative test

Page 14: Predicting Who Will Die Within Six Months

Contingency Table

Dead Alive TotalYes 23 14 47No 141089 680473 821562Total 141112 680487 821609

Page 15: Predicting Who Will Die Within Six Months

Usefulness of the Project• The usefulness of the project to others is that hospitals can focus more

time to these patients which may result in a delay of death or an increase in patient satisfaction.

• Researchers can use this project to better understand trends and stages of different icd9’s, i.e: why there is such a high odds/probability or low odds/probability associated with the corresponding id.

• For me, it gave me some insight on how to apply Bayes Theorem (statistical processes) to a big dataset and how to use different functions within SQL to complete the desired tasks.