predic tion of type -2 diabetes foot ulcer - a comparative ... · 1r. christy pushpaleela and 2r....

Prediction of Type-2 Diabetes Foot Ulcer - A Comparative

Study with Classification Algorithm 1R. Christy Pushpaleela and 2R. Padmajavalli

1Bharathiar University,

Coimbatore.

Department of Computer Science,

Women’s Christian College, Chennai.

[email protected]

2Bharathiar University,

Coimbatore.

Bhaktavatsalam Memorial College for Women,

Chennai.

[email protected]

Abstract This research paper focuses upon the study of foot ulcer-diabetic data and implement

various classification algorithms on different kinds of medical datasets of foot ulcer-Diabetic

to evaluate its comparative performance. Performance analysis records the most frequently

used algorithms on respective medical datasets and most efficient classification algorithm to

analyze the foot ulcer-Diabetic disease based on the comparative study between C4.5 and

SVM on high dimensional datasets. The scope of research work is aimed at predicting

diabetics for foot ulcer patients by comparing various classification algorithms.

Keywords:Classification, prediction, clustering, diabetics, KNN, decision tree, SVM, C4.5.

International Journal of Pure and Applied MathematicsVolume 117 No. 7 2017, 219-230ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu

219

1. Data Mining

Data mining is the process of discovering actionable information from huge sets of data. Data

mining uses mathematical analysis to derive patterns and trends that exist in data. Patterns and

trends can be collected and defined as a data mining model. Diabetes is a group of metabolic

illness due to high blood sugar levels. It may also cause many complication like as heart

problem, kidney issues, foot ulcers. There are three types of diabetics issue namely.

Type 1 - Affects adult and children's- Insulin dependent diabetes.

Type 2 - Affects only Adult -Non Insulin dependent diabetes and Gestational Diabetes - This

will occur when pregnant women without a previous history of diabetes develop a high blood

glucose level.

Data mining involves pulling out of useful information from the huge volume of data. Data

mining technique has been applied in various fields like banking, insurance, medicine etc.

Data mining process followed the below process toget the final model for predication.

Problem Definition.

Preparation.

Data Exploring.

Model Building for predication.

Exploring & Validating models.

Deploying & Updating models.

Figure 1: Data Mining Process Flow

There are two forms of data analysis that can be used for extracting models.

Classification and Prediction.

2. Primary Objective

The Primary Objectives of the present work is as follows:

To compile the aggregated real time data set from reputed Hospitals

International Journal of Pure and Applied Mathematics Special Issue

220

To Pre-process the clustered real time patient data for foot ulcer-diabetic prediction

consuming both manual techniques and Preprocessing tools like Weka.

To apply the classification algorithm for pre-processed foot data to identify the optimum

Predictor.

To classify the efficient algorithm based on the performance analysis report employing

best accuracy rate.

Related work

Auria, Laura and Moro, R. A [6] introduced a statistical technique, SVM which provide a higher

accuracy of company classification into solvent and insolvent. SVM's are more accurate and

convenient even when input data is non-monotonous and non-linearly separable. SVM's do not

deliver parametric score but can offer support for recognizing different financial ratios.

Dr K S OzaMr V S Kumbhar [8] presented a comparison of different data mining techniques

like k-nearest neighbor (KNN), Bayesian network, Decision trees, support vector machines

based on their accuracy, evaluation time and rate of error.

NongyaoNai-aruna and RungruttikarnMoungmaia [5] introduced a web application by using a

use of disease classifiers and a real data set. Various algorithms were used to classify the risk of

diabetes mellitus. Decision Tree, Regression, Naive Baye and Random Forest are used to

classify the risk of diabetes. Based on the various random selection found out that Random

Forest algorithm is effective in increasing the accuracy value and predicting the diabetes.

VrushaliBalpande and RakhiWajgi [4]presented a research work for prediction of diabetes using

data mining technique. Data mining has been applied in various fields like medicine, marketing,

banking, etc. In medicine, predictive data mining is used to diagnose the disease at the earlier

stages itself and helps the physicians in treatment planning procedure. Existing Data mining

method which is efficient in predicting diseases is used for prediction of Diabetes.

Kawsar Ahmed, TasnubaJesmin [1] presented a research work for 20 classification algorithms

are compared by measuring accuracies, speed and robustness using WEKA tool to classify

diabetes patient data.

J.S.Raikwal and KanakSaxena [3] proposed methodology to predict the future disease of the

patients using data patterns. SVM and KNN algorithm is used to classify data -For example

medical patients data to find hidden patterns for targets, like predicting the future diseases of the

patients.

Vincenzo Lagani a, Franco Chiarugi [13] implemented a Data-mining analyses of the

DCCT/EDIC data allow the identification of true predictive models for diabetes-related

impediments.

3. Data Pre-Processing Steps

Step 1. Obtained real time medical data sets are pre-processed manually.

Step 2: Excel data is converted into .CVS format further to .Arff format for effective

processing.


221

Step 3: Weka tool is launched alongside explorer window where in”Preprocess tab” is

selected.

Step4: Arff file format is opened to choose the attributes filed in the data set (e.g. number

of instances, attributes etc).

Step 4: Respective data set is entered and implemented by selecting Visualization button

[1][2][8] [9].

Real-time-Data from Leading Hospital

455 data of patient (355 diabetes patients & 100 non diabetes patients) is collected from leading

hospital. There are male and female patients whose age between 15 to 76 years old and sample

data set listed below.

Table 1: Sample Real Data for Classification

Patient Id Age SEX BMI BPH BPL SugarP SugarPo

STAR01 53 F 31.71 139 73 236 412

STAR02 60 M 28.32 123 77 172 260

STAR03 62 M 31.71 130 80 187 268

STAR04 54 F 28.32 154 84 162 353

STAR05 67 M 28.33 140 89 143 250

STAR06 40 M 28.44 128 67 130 189

STAR07 55 F 32.55 130 90 156 240

STAR08 62 F 31.71 108 62 283 345

STAR09 61 F 28.32 125 78 112 197

STAR10 56 M 31.71 130 80 128 192

STAR11 55 F 28.32 105 71 137 209

STAR12 68 F 28.32 120 70 171 224

STAR13 43 M 28.44 125 84 98 196

STAR14 35 F 32.55 110 70 168 269

STAR15 45 M 31.71 103 78 123 231

Figure 2: Weka Tools – Pre- Process Flow


222

Figure 3: Weka Tools – Process -Visualization

4. Classification Procedures

There are several classifications methods propose by researchers. Some of the methods are

Support Vector Machine.

Decision tree.

Bayesian classification.

C4.5.

K-NN.

SVM Support Vector Machine

A support vector machine is a Classification method.

It is mostly used in classification and regression problems. SVM will support data mining, text

mining, and pattern recognition. SVM provides the user can easy analysis real data or artificial

data. SVM delivers good and optimal solutions [3][6][11][15].

J48(C4.5)

It is an algorithm used to generate a decision tree and is a successor of earlier ID3 Algorithm.

C4.5 algorithm is a greedy algorithm developed by Ross Quinlan, used for the induction of

decision trees. It can be used for classification of the data and so referred to as statistical

classifier [15][25].

Decision Tree

Decision Tree will build a predictive model which is mapped to a tree organization. Means form

of left and right child, root node compensations. Decision is a knowledge representation

structure consisting of nodes & branches organized in the form of Tree such that, every internal

non‐leaf node is labeled with values of the attributes [15].

Naïve Bayes

It is a Standard probabilistic Naïve Bayes classifier. Naive Bayes classifiers are the group of

simple probabilistic classifiers based on applying Bayes' theorem. Naive Bayes classifiers are


223

highly scalable. Naive Bayes model is easy to build for very large data sets [17] [11].

K-NN

K-NN is a type of instance-based learning.KNN is a group of simple algorithm like as

Classification and Regression .The main advantages of KNN are below [5][22].

Easy implementation.

Robust.

Very low cost.

5. Disease Summary

Diabetic foot ulcer [33]

A diabetic foot ulcer is an open sore or wound that occurs in approximately 15 percent of

patients with diabetes. Commonly located on the bottom of the foot [3] [2].

Diabetes Foot Ulcer - Symptoms Diabetic ulcers are most commonly affected

Poor circulation.

High blood sugar (hyperglycemia).

Nerve damage.

Irritated or wounded feet.

Doctor will identify the scale range between 0 to 2 parameter

0: Means ulcer not present but foot at risk

1: Means Ulcer present but no infection

2: Means ulcer deep exposing joints and tendons

Risk Factors for Diabetic Foot Ulcers Patients [1] [5][7][8]. Poor quality shoes.

Poor hygiene (not washing regularly or thoroughly).

Improper trimming of toenails.

Alcohol consumption.

Eye disease from diabetes.

Heart disease.

Kidney disease.

Obesity.

Figure 4: Symptoms Ulcer Foot Ulcer [3] [2].


224

Data Set (Diabetic foot ulcer- Data Set) Sno Attribute

1 Patient Id

2 Name

3 Age

4 SEX

5 BMI

6 WeightKG

7 HeightIN

8 BPH

9 BPL

10 SugarPre

Table 2: Data Set attribute

S.no Attribute

11 SugarPost

12 HBA1CP

13 Hemoglobin

14 UREA

15 Creatinie

16 GlycemicIndex

17 Single/multiple ulcer

18 Alcohol

19 Etiology

20 Culture test

6. Classification Matrix

The basic phenomenon used to classify the diabetic foot ulcer classification using classifier is its

performance and accuracy. The performance of a chosen classifier is validated based on error

rate and computation time [11][13].

Table 3: Confusion Matrix

Predictable Categorized as a Healthy(0) Categorized as a not Healthy(1)

Real Healthy (0) TP

FN

Really not Healthy (1) FP TN

Where

TP- True Positive &FN – False Negative

TN – true Negative&FP - False Positive

For measuring accuracy rate the following mathematical model is used [22][18][21][2].

Accuracy = TP+TN/ TP+FP+ TN +FN


225

7. Assessment Result

Table 4: Performance Report for Classification

Algorithm Accuracy Rate

SVM 92.22

C4.5 85.68

Decision Tree 75.45

KNN 73.77

Figure 5: Performance Report for Classification

8. Conclusion and Future Work

Performance comparison of SVM and C4.5 Classifiers for foot ulcer-Diabetic disease are

recorded to obtain effective methodology for prediction. Preprocessing is carried out using Weka

for building optimum predictability model. The maximum classification accuracies of the SVM

and C4.5 classifiers were found to be 92.22% and 85.68% respectively. Based on the

comparative study, it is concluded that SVM Classifier method produces better accuracy ranges

than C4.5 and other classification algorithm for foot ulcer disease prediction in terms of

Classification accuracy.

Furthermore several available classifications techniques will also be considered to perform risk

factor prediction to arrive at concluding the best suitable algorithm for diabetic prediction in foot

ulcer.

References

[1] Ahmed K., Jesmin T., Comparative Analysis of Data Mining Classification Algorithms in Type-2 Diabetes Prediction Data Using WEKA Approach, International Journal of Science and Engineering 7 (2) (2014), 155-160.

[2] Sharma T.C., Jain M., WEKA approach for comparative study of classification algorithm. International Journal of Advanced Research in Computer and Communication Engineering 2 (4) (2013), 1925-1931.


226

[3] Raikwal J.S., Saxena, K., Performance evaluation of SVM and k-nearest neighbor algorithm over medical data set, International Journal of Computer Applications 50 (14) (2012).

[4] Balpande V., Wajgi R., Review on Prediction of Diabetes using Data Mining Technique.

[5] Nai-arun N., Moungmai R., Comparison of classifiers for the risk of diabetes prediction, Procedia Computer Science 69 (2015), 132-142.

[6] Auria L., Moro R.A., Support vector machines (SVM) as a technique for solvency analysis, DIW Berlin Discussion, 2008.

[7] Peter S., An analytical study on early diagnosis and classification of diabetes mellitus, Bonfring international journal of data mining 4 (2), (2014).

[8] Oza K.S., Kumbhar V.S., WEKA Approach for Performance Evaluation of Various Data Mining Classification Algorithms.

[9] Singhal S., Jena M., A study on WEKA tool for data preprocessing, classification and clustering, International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2 (6) (2013), 250-253.

[10] Dhakate P., Rajeswari K., Abin D., Analysis of Different Classifiers for Medical Dataset using Various Measures, International Journal of Computer Applications 111 (5) (2015).

[11] Reddy S.V.G., Reddy K.T., Kumari V.V., Varma, K.V., An SVM based approach to breast cancer classification using RBF and polynomial kernel functions with varying arguments, International Journal of Computer Science and Information Technologies 5 (4) (2014), 5901-5904.

[12] Eibe F., Data mining practical machine learning tools and techniques, 2nd, Sanfrancisco: Morgan Kaufmann series in data management systems (2005).

[13] Lagani V., Chiarugi F., Thomson S., Fursse J., Lakasing E., Jones R.W., Tsamardinos I., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data. Journal of Diabetes and its Complications 29 (4) (2015), 479-487.

[14] Rashedur M.R., Farhana A., Comparison of Various Classification Techniques Using Different Data Mining Tools For Diabetes Diagnosis, Journal of Software Engineering, 2013.

[15] Viswanathan K., Mayilvahanan K., Christy Pushpaleela R., Performance Analysis of various Data mining classification (C4.5 Vs SVM) Techniques on Diabetics in Heart Problem.

[16] Thair N.P., Survey of Classification Techniques in Data Mining.

[17] Thangadurai K., Nandhin N., Comparison of data mining algorithms for prediction and diagnosis of diabetes mellitus.


227

[18] Rajeswari K., Vaithiyanathan V., Shailaja V.P., Feature Selection for Classification in Medical Data Mining, International journal of emerging trends and technology in computer science 2 (2) (2013).

[19] Parthiban G., Rajesh A., Srivatsa S.K., Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method, International Journal of Computer Applications 24 (3) (2011).

[20] PayalDhakate, SuvarnaPatil, Rajeswari K., Vaithiyananthan V., Deepa Abin, Preprocessing and Classification in WEKA using different classifiers, Journal of Engineering Research and Applications 4 (8), (2014).

[21] Remco R.B., Eibe F., Mark A., HallGeoffrey H., Bernhard P., Peter R., Ian H.W., WEKA Experiences with a Java Open-Source Project, Journal of Machine Learning Research (2010).

[22] Diabetes mellitus –Wikipedia

[23] John G.H., Langley P., Estimating Continuous Dis-tributions in Bayesian Classifiers. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, San Francisco (1995), 338-345.

[24] Quinlan J., C4.5: Programs for Machine Learning. Mo- rgan Kaufmann, San Mateo, 1993.

[25] Witten I.H., Frank E., Data Mining: Practical Ma- chine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco, 2005.

[26] Everitt S., Landau S., Leese M., Cluster Analysis. 5 ed., London: Wiley (2011), 76- 80.

[27] Durairaj M., Kalaiselvi G., Prediction of diabetes using soft computing techniques-A survey. International journal of scientific and technology research 4 (3) (2015) 190-192.

[28] Pushpalatha S., Pandya J. Comparison of Data Models for Hepatitis Diagnosis–Data Mining Technique. International Journal 4(1) (2014).

[29] Ren Diao, Fei Chao, Member, IEEE, Taoxin Peng, Neal Snooke, and Qiang Shen, Feature Selection Inspired Classifier Ensemble Reduction, IEEE Transactions on Cybernetics 44 (8) (2014).

[30] Aha D., UCI Machine Learning Repository: Center for Machine Learning Intelligent Systems.

[31] Parthiban G., Rajesh A., Srivatsa S.K., Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method, International Journal of Computer Applications 24 (3) (2011).

[32] Everitt S., Landau S., Leese M., Cluster Analysis. 5 ed., London: Wiley (2011), 76- 80.

[33] Burges C.J., A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery 2 (2) (1998), 121-167.


228

[34] Rajesh, M., and J. M. Gnanasekar. "Congestion control in heterogeneous WANET using FRCC." Journal of Chemical and Pharmaceutical Sciences ISSN 974 (2015): 2115.

[35] Rajesh, M., and J. M. Gnanasekar. "A systematic review of congestion control in ad hoc network." International Journal of Engineering Inventions 3.11 (2014): 52-56.

[36] Rajesh, M., and J. M. Gnanasekar. " Annoyed Realm Outlook Taxonomy Using Twin Transfer Learning." International Journal of Pure and Applied Mathematics 116.21 (2017) 547-558.

[37] Rajesh, M., and J. M. Gnanasekar. " Get-Up-And-Go Efficientmemetic Algorithm Based Amalgam Routing Protocol." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.

[38] Rajesh, M., and J. M. Gnanasekar. " Congestion Control Scheme for Heterogeneous Wireless Ad Hoc Networks Using Self-Adjust Hybrid Model." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.


229

predic tion of type -2 diabetes foot ulcer - a comparative ... · 1r. christy pushpaleela and 2r....

Documents