predic tion of type -2 diabetes foot ulcer - a comparative ... · 1r. christy pushpaleela and 2r....
TRANSCRIPT
Prediction of Type-2 Diabetes Foot Ulcer - A Comparative
Study with Classification Algorithm 1R. Christy Pushpaleela and 2R. Padmajavalli
1Bharathiar University,
Coimbatore.
Department of Computer Science,
Women’s Christian College, Chennai.
2Bharathiar University,
Coimbatore.
Bhaktavatsalam Memorial College for Women,
Chennai.
Abstract This research paper focuses upon the study of foot ulcer-diabetic data and implement
various classification algorithms on different kinds of medical datasets of foot ulcer-Diabetic
to evaluate its comparative performance. Performance analysis records the most frequently
used algorithms on respective medical datasets and most efficient classification algorithm to
analyze the foot ulcer-Diabetic disease based on the comparative study between C4.5 and
SVM on high dimensional datasets. The scope of research work is aimed at predicting
diabetics for foot ulcer patients by comparing various classification algorithms.
Keywords:Classification, prediction, clustering, diabetics, KNN, decision tree, SVM, C4.5.
International Journal of Pure and Applied MathematicsVolume 117 No. 7 2017, 219-230ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu
219
1. Data Mining
Data mining is the process of discovering actionable information from huge sets of data. Data
mining uses mathematical analysis to derive patterns and trends that exist in data. Patterns and
trends can be collected and defined as a data mining model. Diabetes is a group of metabolic
illness due to high blood sugar levels. It may also cause many complication like as heart
problem, kidney issues, foot ulcers. There are three types of diabetics issue namely.
Type 1 - Affects adult and children's- Insulin dependent diabetes.
Type 2 - Affects only Adult -Non Insulin dependent diabetes and Gestational Diabetes - This
will occur when pregnant women without a previous history of diabetes develop a high blood
glucose level.
Data mining involves pulling out of useful information from the huge volume of data. Data
mining technique has been applied in various fields like banking, insurance, medicine etc.
Data mining process followed the below process toget the final model for predication.
Problem Definition.
Preparation.
Data Exploring.
Model Building for predication.
Exploring & Validating models.
Deploying & Updating models.
Figure 1: Data Mining Process Flow
There are two forms of data analysis that can be used for extracting models.
Classification and Prediction.
2. Primary Objective
The Primary Objectives of the present work is as follows:
To compile the aggregated real time data set from reputed Hospitals
International Journal of Pure and Applied Mathematics Special Issue
220
To Pre-process the clustered real time patient data for foot ulcer-diabetic prediction
consuming both manual techniques and Preprocessing tools like Weka.
To apply the classification algorithm for pre-processed foot data to identify the optimum
Predictor.
To classify the efficient algorithm based on the performance analysis report employing
best accuracy rate.
Related work
Auria, Laura and Moro, R. A [6] introduced a statistical technique, SVM which provide a higher
accuracy of company classification into solvent and insolvent. SVM's are more accurate and
convenient even when input data is non-monotonous and non-linearly separable. SVM's do not
deliver parametric score but can offer support for recognizing different financial ratios.
Dr K S OzaMr V S Kumbhar [8] presented a comparison of different data mining techniques
like k-nearest neighbor (KNN), Bayesian network, Decision trees, support vector machines
based on their accuracy, evaluation time and rate of error.
NongyaoNai-aruna and RungruttikarnMoungmaia [5] introduced a web application by using a
use of disease classifiers and a real data set. Various algorithms were used to classify the risk of
diabetes mellitus. Decision Tree, Regression, Naive Baye and Random Forest are used to
classify the risk of diabetes. Based on the various random selection found out that Random
Forest algorithm is effective in increasing the accuracy value and predicting the diabetes.
VrushaliBalpande and RakhiWajgi [4]presented a research work for prediction of diabetes using
data mining technique. Data mining has been applied in various fields like medicine, marketing,
banking, etc. In medicine, predictive data mining is used to diagnose the disease at the earlier
stages itself and helps the physicians in treatment planning procedure. Existing Data mining
method which is efficient in predicting diseases is used for prediction of Diabetes.
Kawsar Ahmed, TasnubaJesmin [1] presented a research work for 20 classification algorithms
are compared by measuring accuracies, speed and robustness using WEKA tool to classify
diabetes patient data.
J.S.Raikwal and KanakSaxena [3] proposed methodology to predict the future disease of the
patients using data patterns. SVM and KNN algorithm is used to classify data -For example
medical patients data to find hidden patterns for targets, like predicting the future diseases of the
patients.
Vincenzo Lagani a, Franco Chiarugi [13] implemented a Data-mining analyses of the
DCCT/EDIC data allow the identification of true predictive models for diabetes-related
impediments.
3. Data Pre-Processing Steps
Step 1. Obtained real time medical data sets are pre-processed manually.
Step 2: Excel data is converted into .CVS format further to .Arff format for effective
processing.
International Journal of Pure and Applied Mathematics Special Issue
221
Step 3: Weka tool is launched alongside explorer window where in”Preprocess tab” is
selected.
Step4: Arff file format is opened to choose the attributes filed in the data set (e.g. number
of instances, attributes etc).
Step 4: Respective data set is entered and implemented by selecting Visualization button
[1][2][8] [9].
Real-time-Data from Leading Hospital
455 data of patient (355 diabetes patients & 100 non diabetes patients) is collected from leading
hospital. There are male and female patients whose age between 15 to 76 years old and sample
data set listed below.
Table 1: Sample Real Data for Classification
Patient Id Age SEX BMI BPH BPL SugarP SugarPo
STAR01 53 F 31.71 139 73 236 412
STAR02 60 M 28.32 123 77 172 260
STAR03 62 M 31.71 130 80 187 268
STAR04 54 F 28.32 154 84 162 353
STAR05 67 M 28.33 140 89 143 250
STAR06 40 M 28.44 128 67 130 189
STAR07 55 F 32.55 130 90 156 240
STAR08 62 F 31.71 108 62 283 345
STAR09 61 F 28.32 125 78 112 197
STAR10 56 M 31.71 130 80 128 192
STAR11 55 F 28.32 105 71 137 209
STAR12 68 F 28.32 120 70 171 224
STAR13 43 M 28.44 125 84 98 196
STAR14 35 F 32.55 110 70 168 269
STAR15 45 M 31.71 103 78 123 231
Figure 2: Weka Tools – Pre- Process Flow
International Journal of Pure and Applied Mathematics Special Issue
222
Figure 3: Weka Tools – Process -Visualization
4. Classification Procedures
There are several classifications methods propose by researchers. Some of the methods are
Support Vector Machine.
Decision tree.
Bayesian classification.
C4.5.
K-NN.
SVM Support Vector Machine
A support vector machine is a Classification method.
It is mostly used in classification and regression problems. SVM will support data mining, text
mining, and pattern recognition. SVM provides the user can easy analysis real data or artificial
data. SVM delivers good and optimal solutions [3][6][11][15].
J48(C4.5)
It is an algorithm used to generate a decision tree and is a successor of earlier ID3 Algorithm.
C4.5 algorithm is a greedy algorithm developed by Ross Quinlan, used for the induction of
decision trees. It can be used for classification of the data and so referred to as statistical
classifier [15][25].
Decision Tree
Decision Tree will build a predictive model which is mapped to a tree organization. Means form
of left and right child, root node compensations. Decision is a knowledge representation
structure consisting of nodes & branches organized in the form of Tree such that, every internal
non‐leaf node is labeled with values of the attributes [15].
Naïve Bayes
It is a Standard probabilistic Naïve Bayes classifier. Naive Bayes classifiers are the group of
simple probabilistic classifiers based on applying Bayes' theorem. Naive Bayes classifiers are
International Journal of Pure and Applied Mathematics Special Issue
223
highly scalable. Naive Bayes model is easy to build for very large data sets [17] [11].
K-NN
K-NN is a type of instance-based learning.KNN is a group of simple algorithm like as
Classification and Regression .The main advantages of KNN are below [5][22].
Easy implementation.
Robust.
Very low cost.
5. Disease Summary
Diabetic foot ulcer [33]
A diabetic foot ulcer is an open sore or wound that occurs in approximately 15 percent of
patients with diabetes. Commonly located on the bottom of the foot [3] [2].
Diabetes Foot Ulcer - Symptoms Diabetic ulcers are most commonly affected
Poor circulation.
High blood sugar (hyperglycemia).
Nerve damage.
Irritated or wounded feet.
Doctor will identify the scale range between 0 to 2 parameter
0: Means ulcer not present but foot at risk
1: Means Ulcer present but no infection
2: Means ulcer deep exposing joints and tendons
Risk Factors for Diabetic Foot Ulcers Patients [1] [5][7][8]. Poor quality shoes.
Poor hygiene (not washing regularly or thoroughly).
Improper trimming of toenails.
Alcohol consumption.
Eye disease from diabetes.
Heart disease.
Kidney disease.
Obesity.
Figure 4: Symptoms Ulcer Foot Ulcer [3] [2].
International Journal of Pure and Applied Mathematics Special Issue
224
Data Set (Diabetic foot ulcer- Data Set) Sno Attribute
1 Patient Id
2 Name
3 Age
4 SEX
5 BMI
6 WeightKG
7 HeightIN
8 BPH
9 BPL
10 SugarPre
Table 2: Data Set attribute
S.no Attribute
11 SugarPost
12 HBA1CP
13 Hemoglobin
14 UREA
15 Creatinie
16 GlycemicIndex
17 Single/multiple ulcer
18 Alcohol
19 Etiology
20 Culture test
6. Classification Matrix
The basic phenomenon used to classify the diabetic foot ulcer classification using classifier is its
performance and accuracy. The performance of a chosen classifier is validated based on error
rate and computation time [11][13].
Table 3: Confusion Matrix
Predictable Categorized as a Healthy(0) Categorized as a not Healthy(1)
Real Healthy (0) TP
FN
Really not Healthy (1) FP TN
Where
TP- True Positive &FN – False Negative
TN – true Negative&FP - False Positive
For measuring accuracy rate the following mathematical model is used [22][18][21][2].
Accuracy = TP+TN/ TP+FP+ TN +FN
International Journal of Pure and Applied Mathematics Special Issue
225
7. Assessment Result
Table 4: Performance Report for Classification
Algorithm Accuracy Rate
SVM 92.22
C4.5 85.68
Decision Tree 75.45
KNN 73.77
Figure 5: Performance Report for Classification
8. Conclusion and Future Work
Performance comparison of SVM and C4.5 Classifiers for foot ulcer-Diabetic disease are
recorded to obtain effective methodology for prediction. Preprocessing is carried out using Weka
for building optimum predictability model. The maximum classification accuracies of the SVM
and C4.5 classifiers were found to be 92.22% and 85.68% respectively. Based on the
comparative study, it is concluded that SVM Classifier method produces better accuracy ranges
than C4.5 and other classification algorithm for foot ulcer disease prediction in terms of
Classification accuracy.
Furthermore several available classifications techniques will also be considered to perform risk
factor prediction to arrive at concluding the best suitable algorithm for diabetic prediction in foot
ulcer.
References
[1] Ahmed K., Jesmin T., Comparative Analysis of Data Mining Classification Algorithms in Type-2 Diabetes Prediction Data Using WEKA Approach, International Journal of Science and Engineering 7 (2) (2014), 155-160.
[2] Sharma T.C., Jain M., WEKA approach for comparative study of classification algorithm. International Journal of Advanced Research in Computer and Communication Engineering 2 (4) (2013), 1925-1931.
International Journal of Pure and Applied Mathematics Special Issue
226
[3] Raikwal J.S., Saxena, K., Performance evaluation of SVM and k-nearest neighbor algorithm over medical data set, International Journal of Computer Applications 50 (14) (2012).
[4] Balpande V., Wajgi R., Review on Prediction of Diabetes using Data Mining Technique.
[5] Nai-arun N., Moungmai R., Comparison of classifiers for the risk of diabetes prediction, Procedia Computer Science 69 (2015), 132-142.
[6] Auria L., Moro R.A., Support vector machines (SVM) as a technique for solvency analysis, DIW Berlin Discussion, 2008.
[7] Peter S., An analytical study on early diagnosis and classification of diabetes mellitus, Bonfring international journal of data mining 4 (2), (2014).
[8] Oza K.S., Kumbhar V.S., WEKA Approach for Performance Evaluation of Various Data Mining Classification Algorithms.
[9] Singhal S., Jena M., A study on WEKA tool for data preprocessing, classification and clustering, International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2 (6) (2013), 250-253.
[10] Dhakate P., Rajeswari K., Abin D., Analysis of Different Classifiers for Medical Dataset using Various Measures, International Journal of Computer Applications 111 (5) (2015).
[11] Reddy S.V.G., Reddy K.T., Kumari V.V., Varma, K.V., An SVM based approach to breast cancer classification using RBF and polynomial kernel functions with varying arguments, International Journal of Computer Science and Information Technologies 5 (4) (2014), 5901-5904.
[12] Eibe F., Data mining practical machine learning tools and techniques, 2nd, Sanfrancisco: Morgan Kaufmann series in data management systems (2005).
[13] Lagani V., Chiarugi F., Thomson S., Fursse J., Lakasing E., Jones R.W., Tsamardinos I., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data. Journal of Diabetes and its Complications 29 (4) (2015), 479-487.
[14] Rashedur M.R., Farhana A., Comparison of Various Classification Techniques Using Different Data Mining Tools For Diabetes Diagnosis, Journal of Software Engineering, 2013.
[15] Viswanathan K., Mayilvahanan K., Christy Pushpaleela R., Performance Analysis of various Data mining classification (C4.5 Vs SVM) Techniques on Diabetics in Heart Problem.
[16] Thair N.P., Survey of Classification Techniques in Data Mining.
[17] Thangadurai K., Nandhin N., Comparison of data mining algorithms for prediction and diagnosis of diabetes mellitus.
International Journal of Pure and Applied Mathematics Special Issue
227
[18] Rajeswari K., Vaithiyanathan V., Shailaja V.P., Feature Selection for Classification in Medical Data Mining, International journal of emerging trends and technology in computer science 2 (2) (2013).
[19] Parthiban G., Rajesh A., Srivatsa S.K., Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method, International Journal of Computer Applications 24 (3) (2011).
[20] PayalDhakate, SuvarnaPatil, Rajeswari K., Vaithiyananthan V., Deepa Abin, Preprocessing and Classification in WEKA using different classifiers, Journal of Engineering Research and Applications 4 (8), (2014).
[21] Remco R.B., Eibe F., Mark A., HallGeoffrey H., Bernhard P., Peter R., Ian H.W., WEKA Experiences with a Java Open-Source Project, Journal of Machine Learning Research (2010).
[22] Diabetes mellitus –Wikipedia
[23] John G.H., Langley P., Estimating Continuous Dis-tributions in Bayesian Classifiers. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, San Francisco (1995), 338-345.
[24] Quinlan J., C4.5: Programs for Machine Learning. Mo- rgan Kaufmann, San Mateo, 1993.
[25] Witten I.H., Frank E., Data Mining: Practical Ma- chine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
[26] Everitt S., Landau S., Leese M., Cluster Analysis. 5 ed., London: Wiley (2011), 76- 80.
[27] Durairaj M., Kalaiselvi G., Prediction of diabetes using soft computing techniques-A survey. International journal of scientific and technology research 4 (3) (2015) 190-192.
[28] Pushpalatha S., Pandya J. Comparison of Data Models for Hepatitis Diagnosis–Data Mining Technique. International Journal 4(1) (2014).
[29] Ren Diao, Fei Chao, Member, IEEE, Taoxin Peng, Neal Snooke, and Qiang Shen, Feature Selection Inspired Classifier Ensemble Reduction, IEEE Transactions on Cybernetics 44 (8) (2014).
[30] Aha D., UCI Machine Learning Repository: Center for Machine Learning Intelligent Systems.
[31] Parthiban G., Rajesh A., Srivatsa S.K., Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method, International Journal of Computer Applications 24 (3) (2011).
[32] Everitt S., Landau S., Leese M., Cluster Analysis. 5 ed., London: Wiley (2011), 76- 80.
[33] Burges C.J., A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery 2 (2) (1998), 121-167.
International Journal of Pure and Applied Mathematics Special Issue
228
[34] Rajesh, M., and J. M. Gnanasekar. "Congestion control in heterogeneous WANET using FRCC." Journal of Chemical and Pharmaceutical Sciences ISSN 974 (2015): 2115.
[35] Rajesh, M., and J. M. Gnanasekar. "A systematic review of congestion control in ad hoc network." International Journal of Engineering Inventions 3.11 (2014): 52-56.
[36] Rajesh, M., and J. M. Gnanasekar. " Annoyed Realm Outlook Taxonomy Using Twin Transfer Learning." International Journal of Pure and Applied Mathematics 116.21 (2017) 547-558.
[37] Rajesh, M., and J. M. Gnanasekar. " Get-Up-And-Go Efficientmemetic Algorithm Based Amalgam Routing Protocol." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.
[38] Rajesh, M., and J. M. Gnanasekar. " Congestion Control Scheme for Heterogeneous Wireless Ad Hoc Networks Using Self-Adjust Hybrid Model." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.
International Journal of Pure and Applied Mathematics Special Issue
229
230