automated differential diagnosis of heart disease in ... · 7.2.89(3) conduction disorders cardiac...
TRANSCRIPT
Automated Differential Diagnosis of Heart Disease In Emergency Department
Diyang XueDec 07, 2018
University Of Pittsburgh
University Of Pittsburgh
Differential Diagnosis● In medicine, a differential diagnosis is the distinguishing of a particular
disease or condition from others that present similar clinical features
● Differential diagnostic procedures are used by physicians and othertrained medical professionals to diagnose the specific disease in apatient
University Of Pittsburgh
University Of Pittsburgh
Hypothesis statement
● One can systematically do differential diagnosis with the help of
machine learning algorithms, and reach the levels of experiencedphysicians in accuracy and effectiveness
● In other words, we want to mimic doctor's diagnosis process, find themost useful questions, make diagnosis with the fewest steps
University Of Pittsburgh
Decision Tree algorithm
University Of Pittsburgh
Data● Emergency department visits whose primary diagnosis is heart disease from 15
hospitals of UPMC● Year 2008 - year 2013 as training data, year 2014 as test data
○ Training data:91,036○ Test data:26,193
● Heart disease○ There are 239 ICD-9-CM codes for heart diseases ○ The Clinical Classifications Software (CCS)○ 11 categories totally○ Based on physician’s suggestion:delete 7.2.5; merge 7.2.8 and 7.2.9 to
7.2.89; merge 7.2.1, 7.2.6, 7.2.7, 7.2.10 to “other”
University Of Pittsburgh
CCS level Level Name train(2008-2013) test(2014) total
7.2.1 Heart valve disorders 132 62 194
7.2.2 Peri-; endo-; and myocarditis; cardiomyopathy (except that caused by TB or STD)
364 117 481
7.2.3 Acute myocardial infarction 1014 327 1341
7.2.4 Coronary atherosclerosis and other heart disease
2597 788 3385
7.2.5 Nonspecific chest pain 69234 19317 88551
7.2.6 Pulmonary heart disease 309 138 447
7.2.7 Other and ill-defined heart disease 51 21 72
7.2.8 Conduction disorders 229 85 314
7.2.9 Cardiac dysrhythmias 12925 4078 17003
7.2.10 Cardiac arrest and ventricular fibrillation 1980 485 2465
7.2.11 Congestive heart failure; nonhypertensive 2201 775 2976
SUM 91036 26193 117229
University Of Pittsburgh
CCS level Level Name train(2008-2013) test(2014) total
7.2.1 Heart valve disorders 132 62 194
7.2.2 Peri-; endo-; and myocarditis; cardiomyopathy (except that caused by TB or STD)
364 117 481
7.2.3 Acute myocardial infarction 1014 327 1341
7.2.4 Coronary atherosclerosis and other heart disease
2597 788 3385
7.2.5 Nonspecific chest pain 69234 19317 88551
7.2.6 Pulmonary heart disease 309 138 447
7.2.7 Other and ill-defined heart disease 51 21 72
7.2.8 Conduction disorders 229 85 314
7.2.9 Cardiac dysrhythmias 12925 4078 17003
7.2.10 Cardiac arrest and ventricular fibrillation 1980 485 2465
7.2.11 Congestive heart failure; nonhypertensive 2201 775 2976
SUM 91036 26193 117229
University Of Pittsburgh
CCS level Level Name train(2008-2013) test(2014) total
7.2.2(0) Peri-; endo-; and myocarditis; cardiomyopathy (except that caused by TB or STD)
364 117 481
7.2.3(1) Acute myocardial infarction 1014 327 1341
7.2.4(2) Coronary atherosclerosis and other heart disease
2597 788 3385
7.2.89(3) Conduction disorders Cardiac dysrhythmias
13154 4163 17317
7.2.11(4) Congestive heart failure; nonhypertensive 2201 775 2976
OTHER(7.2.1,7.2.6,7.2.7,7.2.10)(5)
Heart valve disordersPulmonary heart diseaseOther and ill-defined heart diseaseCardiac arrest and ventricular fibrillation
2472 706 3178
SUM 21802 6876 28678
University Of Pittsburgh
Data
● Features○ Demographic data, Discharge report (parsed by Medlee), choose
Semantic Types “finding” and “sign and symptom” features○ 8468 features
■ 5 demographic features:Gender, Race, Age, Income, Insurance
■ 8463 NLP features
University Of Pittsburgh
Decision tree algorithm
● Scikit-learn package○ CART algorithm
● Gini index
University Of Pittsburgh
Performance(accuracy)● 10-fold cross-validation: max_depth: 12; min_samples_leaf: 30● Accuracy: 0.7686
University Of Pittsburgh
Performance(F1 score)● 10-fold cross-validation: max_depth: 14; min_samples_leaf: 10● Accuracy: 0.7695; F1 score: 0.522
University Of Pittsburgh
Why performance so bad
● Group patients based on primary diagnosis
● A patient may have multiple heart diseases at the same time
University Of Pittsburgh
Pure data
Training Data 10,698
Test Data 3,195
University Of Pittsburgh
Pure data--imbalance--accuracy● 10-fold cross-validation: max_depth: 12; min_samples_leaf: 10● Accuracy: 0.8948
Total: 10698Accuracy: 0.8948
University Of Pittsburgh
Pure data--imbalance--F1 score● 10-fold cross-validation: max_depth: 18; min_samples_leaf: 10● Accuracy: 0.8910
Total: 10698
Accuracy: 0.8910
University Of Pittsburgh
Pure data -- balanced data--undersampling● 10-fold cross-validation: max_depth: 10; min_samples_leaf: 10● Accuracy: 0.8013
Total: 1899
Accuracy: 0.8013
University Of Pittsburgh
Pure data -- balanced data--oversampling● 10-fold cross-validation: max_depth: 14; min_samples_leaf: 10● Accuracy: 0.7775
Total: 8270 Accuracy: 0.7775
University Of Pittsburgh
Decision Tree framework● Try other algorithm besides classic decision tree algorithm to choose the best split node
University Of Pittsburgh
Performance--random Forest● 10-fold cross-validation: max_depth: 14; min_samples_leaf: 10● Accuracy: 0.8920
University Of Pittsburgh
Future work
● We can not find good split point for certain categories: 0 and 1● Future work:
○ Combine external medical knowledge with pure machine learning algorithm○ Compare the performance of our model with physician
University Of Pittsburgh
External knowledge
1. Improve classification performance2. Make decision tree more clinical meaningful
University Of Pittsburgh
External knowledge
University Of Pittsburgh
External knowledge
University Of Pittsburgh
External Knowledge
University Of Pittsburgh
External Knowledge
University Of Pittsburgh
External Knowledge
University Of Pittsburgh
External Knowledge
C0000727 Abdomen, Acute
C0000731 Abdomen distended
C0000734 Abdominal mass
C0000737 Abdominal Pain