student performance prediction using c4.5 decision tree

Student Performance Prediction Using C4.5 Decision Tree and

CART Algorithm

Kamal Bunkar1, Prof. Sanjay Tanwani

2

1Ph.D Scholar , School of Computer Science and IT DAVV, Indore 452001, India.

2Professor and Head of Department of Computer Science and IT DAVV, Indore 452001, India.

Abstract

Data Mining provides important techniques for diverse fields like

education. Work in the educational sector is growing increasingly due to

the vast amount of data from students that can be used to discover useful

patterns in learning behavior relating to students. Student success in

university courses is of great concern to higher education where the

success can be influenced by many factors. This paper is an attempt to

apply the processes of data mining, particularly classification, to help

improve the quality of the higher education system by analyzing student

data that influence student performance in courses. The proposed

recommendation model consists of two major components first, the

student learning methodology analysis, and second is the performance or

achievement prediction. For student learning behavior analysis the paper

includes the results and algorithm selection; further, it contains the

model of the student performance prediction model. In this context, the

two popular data mining algorithms termed as C4.5 and CART decision

trees are applied. Those techniques accept the student performance or

achievement data for training and by using the student’s current

performance, the future performance is prognosticated. The comparison

of the performance of both algorithms is studied for prediction accuracy,

error rate, time, and memory usages. The results illustrate the C4.5

decision tree-based performance prediction divulges higher accuracy

and lesser memory and shortertime consumption. Therefore in the near

future, the C4.5 decision tree algorithm is adopted.

Parishodh Journal

Volume IX, Issue II, February/2020

ISSN NO:2347-6648

Page No:1702

Keywords: Data Mining, Educational Data mining, Clustering

Algorithm,Classification alogorithm, student learning pattern and

performance.

I. INTRODUCTION

Data mining can be used for application design and its improvements [1]. Various techniques

are implied on data to obtain results as per requirements. There are two kinds of algorithms

supervised and unsupervised learning [2]. These algorithms are employed for various kinds

of tasks as prediction, classification, clustering, association mining, and others. In this work,

the data mining techniques are used that works with the educational sector data known as

Educational Data Mining (EDM) [3]. The proposed work primarily focused on empowering

the student’s learning and performance, enhancing the teacher’s productivity, obtaining the

student learning patterns, and recommending the relevant study or course material. Thus the

work is divided into three modules. In the first module, the clustering algorithm is employed

for obtaining a group of weak, average, and efficient students. These groups of students will

be helping us in finding learning behavior or pattern. Additionally, that also works as

feedback to educators or teachers for optimizing their methodology and offering the most

compatible or suitable resources.

The second module proffers a predictive technique for students’ performance or achievement

prediction. The predicted data is used for understanding the future improvement and growth

in the performance of a student. Finally, a recommendation model is proposed that

amalgamates modules for interpretation of the student’s learning behavior and recommends

the most compatible learning material. The EDM systems are not only recommending just the

employment of data mining algorithms but it also offers to explore the patterns and hidden

information in underdone academic data [4]. In this context, the four goals of the proposed

work are established. The first one is to understand the learning methodology of the students

and to ameliorate the productivity of teachers and educators. Therefore an unsupervised

learning model for student learning behavior analysis is prepared. The obtained consequence

of the experiments is also reported in this paper. Further, the paper is intended to predict the

performance or achievements of students to make available resources and future growth and

improvements of students.

Parishodh Journal


ISSN NO:2347-6648

Page No:1703

In this context, the supervised learning-based student performance prediction model is

introduced in this paper. This model and previously introduced model used for designing a

recommender system that offers the course materials for the learning to the students. In this

paper first, the recently conducted experiments are highlighted, and based on the results

analysis clustering algorithm is selected. Further, a model using supervised learning

techniques is proposed for predicting student performance. The student performance and

learning behavior, both are used for designing the required recommendation model.

II. PROPOSED WORK

The student behavior analysis is also termed as student learning pattern analysis. It helps to

understand the learning ability of the students [5]. Using these patterns we approximate how

the different group of students is learning and which group of students is weak. In ML and

DM for grouping the clustering algorithms are used [6]. The figure 1 shows the data model for

student learning pattern analysis. A Student dataset university student’s dataset as

experimental dataset. To find the groups of students, who has the similar performances or

learning behavior (i.e. low, mid and high).

Figure 1.Student Behavior Analysis

The input dataset preprocessed in next step.The data preprocessing is a step of data mining

where the data is optimized. That is cleaning operation one dataset [7].That may produce

conflicts during the decision making or can influence the actual target values. To

preprocessing is described in [first paper reference].

Parishodh Journal


ISSN NO:2347-6648

Page No:1704

For computing the students learning behavior most of the authors is favoring to use of

clustering algorithms. Thus we implemented three clustering algorithms, namely k-means

clustering [8], fuzzy c-means clustering (FCM) [9] and a kernel based fuzzy c means (KFCM)

[10] algorithm. The modified FCM algorithm is usages the Gaussian kernel function[11]. The

system user selects one of the algorithms for performing the experiments. The preprocessed

data is used with the clustering algorithm and the groups of students are created according to

their pattern similarity. After training the selected centroids used for categorizing data

according to the distance or membership values.The test samples are prepared using the

random selection of data instances that contains 30% of instances. The test data clustered

according to selected centroids. According to the categorized data the performance of

algorithms was calculated.The algorithm of this process is also available in [reference self].

The student’s performance dataset is evaluated using three clustering algorithms (i.e. k-

means, FCMand KFCM). These clustering algorithms creating students groups according to

their performance in three main categories i.e. low, medium and high. These groups of

students are helpful for preparing the teaching strategy for different performer students [12].

That enhances teacher’s productivity as well as student performance [13]. Based on the

carried out experiments performance of these three algorithms are measured.

Table 1.Performance Comparison of Clustering

Exp. No K-Means FCM Improved FCM

Accuracy (%) 76.22 80.76 85.31

Time (MS) 163.42 283 347.85

Memory (KB) 15523 17835 18586

The mean accuracy ofclustering algorithms is reported in figure 2 and table 4. The number of

experiment is carried out and the basis of captured results mean accuracy is calculated.

According to the results the k-means clustering is producing less accurate results as compared

to FCM and KFCM.The memory usages of the clustering algorithms are reported in figure

2(a). According to the results the KFCM algorithm is consuming the higher memory. The

time consumption of all three algorithms for students learning pattern analysis is reported in

figure 4(c). The Y axis shows the time consumed in milliseconds (MS). Therefore according

to the performance of k-means is efficient as compared to FCM and improved FCM

algorithm. But in terms of accuracy the improved FCM is winner.

Parishodh Journal


ISSN NO:2347-6648

Page No:1705

Recently we proposed three modules to be implement, a model for student learning pattern or

behavior analysis. That technique usage the clustering algorithm the learning behavior is

identified. The student performance data is used and the models performance is evaluated.

The kernel based FCM is accurate enough. In this work the accuracy is the key parameter for

algorithm selection. Thus in further experiments the KFCM (Kernel based FCM) is being

used.

In this work two other data models of supervised learning algorithm namely CART

(classification and regression tree) [14] algorithm and C4.5 [15] is used. Using the efficient

classifier the student performance prediction system is proposed. That helpsto the user

(student/teachers) to get feedback about the student using performance prediction.

III. STUDENT PERFORMANCE PREDICTION

The performance of a student is an indicator of teacher’s efforts. Therefore in order to track

the performance of students some technique is required [16]. In literature there are a number

of ML techniques available for predicting student’s performance. In this work we prepared a

data model for analyzing the historical learning performance pattern and predict the next

performance based on the current values. Therefore a data mining model is presented. The

student performance prediction model is demonstrated in figure 3. The student performance

dataset collected previously using online sources for higher education course is used here.

That dataset contains different attributes to indicate the student’s performance. Additionally

there are two class labels are available. That dataset is used with the proposed model for

predicting the student performance.

Parishodh Journal


ISSN NO:2347-6648

Page No:1706

The aim of data preprocessing is to clean the noise and unwanted data. Therefore the

preprocessing techniques are normally utilized.In this presented work the dataset is used in a

vectored format thus the previously reported preprocessing algorithm is used for data

completeness null values are removed. According to the algorithm the dataset D contains rows

and columns in a vector. During the evaluation of attributes if any attribute of a data instance

is missing or null then the row is removed otherwise the data instance is included to a data

vector.The supervised data mining algorithms requires a set of pre-identified training samples.

These samples used to creates a data model and using which the similar patterns are identified.

The dataset is subdivided into two parts training and testing. The 70% of randomly selected

data were used for training of the classifiers. Remaining 30% of randomly selected data are

used with the trained data model as test data contains the class labels also for validation of the

predicted labels.

Figure 3.Student Performance Prediction Model

In this phase two decision tree algorithms namely CART and C4.5 algorithms are

implemented with the help of WEKA data mining tool and JAVA technology. User can select

an appropriate decision tree algorithm for conducting the experiments.The work includes the

predictive data modeling for student performance. Thebasic details about both the decision

tree algorithms are explained here. The system needs to generate the prediction for student

Parishodh Journal


ISSN NO:2347-6648

Page No:1707

performance therefore the C4.5 and CART algorithm is used. First we discuss the C4.5

decision tree. It is an extension of a decision tree ID3. That usage the concept of information

gain (IG) to create the data partitions. The attribute with the highest IG is selected to make the

decision. The C4.5 algorithm then using partitioned sub lists a complete decision tree is

developed. The algorithm considers the following basic constraints [14].

1. If samples in dataset contain same class then it simply creates a leaf node as decision tree.

2. If IG is not feasible then it creates a node higher up then tree using the expected value of

class.

3. If unseen class encountered, it creates a decision node using the target value.

To define IG, first require to discuss entropy. For instance the decision tree has two

categories, i.e. P (positive) and N (negative). Thus a set S, containing these positive and

negative targets, the entropy of S is:

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −𝑃 𝑃𝑜𝑠 𝑙𝑜𝑔2𝑃 𝑃𝑜𝑠 − 𝑃(𝑛𝑒𝑔)𝑙𝑜𝑔2𝑃(𝑛𝑒𝑔)

P (pos): proportion of positive examples in S

P (neg): proportion of negative examples in S

As already discussed, for cutting down the depth of a decision tree, while traversing the same,

selection of the best possible characteristic is mandatory in order to split the tree, this clearly

shows that attribute with minimum drop of entropy will be the best pick. Here, the IG can be

termed as required drop in entropy in relation with an attribute during the tree splitting. The

IG, Gain (E, A) of an attribute A can be defined as:

𝐺𝑎𝑖𝑛 𝐸, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑠 − 𝐸𝑣

𝐸𝑋𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐸𝑣

𝑣

𝑛=1

The IGis used to decide positions of attributes and to construct trees in which every node is

positioned are attributes with maximum IG.Among those attributes that are not considered in

the path from the root yet. The intention is:

1. To generate small size tree and identify records after a handful steps.

2. To attain the desired level of unfussiness of the decisional approaches.

Parishodh Journal


ISSN NO:2347-6648

Page No:1708

CART Algorithm

CART [66] (Classification and Regression Trees) is introduced by Brieman, based on Hunt’s

algorithm. It handles both ( i. e. categorical and continuous) attributes to build a tree. It

handles missing values and uses Gini Index (GI) as selection measure. CART produces binary

splits. GImeasure uses cost complexity pruning to remove the unreliable branches from the

tree to improve the accuracy.To measure degree of impurity the GI is used.That is defined as:

𝐺𝑖𝑛𝑖 𝑇 = 1 − 𝑝𝑗2

𝑛

𝑗=1

GI of a table consist of single class is zero because the probability is 1 and 1 − 12 = 0.

Similar to Entropy, GI also reaches maximum value when all classes in the table have equal

probability. To work out the information gain for A relative to S, first it needs to calculate the

GI of S. Here S is a set of 120 examples are 70 “First”, 19 “Second”, 15 “Third” and 16

“Fail”.

𝐺𝑖𝑛𝑖 𝑆 = 1 − 𝑃𝑓𝑖𝑟𝑠𝑡 log2 𝑃𝑓𝑖𝑟𝑠𝑡 − 𝑃𝑠𝑒𝑐𝑜𝑛𝑑 log2 𝑃𝑠𝑒𝑐𝑜𝑛𝑑 − 𝑃𝑡𝑕𝑖𝑟𝑑 log2 𝑃𝑡𝑕𝑖𝑟𝑑

To determine the best attribute for a particular node IG is calculated. The information gain is

defined as,

𝑃𝑓𝑎𝑖𝑙 log2 𝑃𝑓𝑎𝑖𝑙 = 1 − 0.582 + 0.152 + 0.1252 + 0.1332

𝐺𝑖𝑛𝑖 𝑆 = 0.6015

So,

𝐼𝐺 𝑓 = 𝑓𝑖(1 − 𝑓𝑖)

𝑚

𝑖=1

GI and IGare calculated for all the nodes. As the result of the calculation, the attribute ParQua

is used to expand the tree. Then delete the attribute of the samples in these sub-nodes and

compute the GI and the IG to expand the tree using the attribute with highest gain. Repeat the

process until the Entropy of the node equals null. At that moment, the node cannot be

expanded anymore because the samples in this node belong to the same class.

Parishodh Journal


ISSN NO:2347-6648

Page No:1709

Decision tree: the decision tree algorithm works on the input training samples and produces a

tree. In this tree the branches of tree includes the combination of dataset attributes with their

values. The nodes contain the attribute name and edge contains the values. Finally in the leaf

node the decisions are available which is predicted when the testing data instances are applied.

Prediction outcomes: the testing data with the class labels are used with the previous phase

prepared tree. Using the available attributes and it’s values the decision tree is traversed and

the prediction using the leaf node is performed. The predicted outcome and dataset outcome is

compared to get the prediction accuracy.

This model help to predict the performance of students, the next section provide the results

analysis of these two algorithms.

IV. RESULTS ANALYSIS

Rightness of any calculation is estimated by its precision. Through precision of any

calculation we find that how great a calculation is. The following is the recipe to ascertain

exactness.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 % =𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑆𝑎𝑚𝑝𝑙𝑒𝑠

𝑡𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑡𝑜 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑦𝑋100

Table 2.Accuracy Comparison

Experiment No C4.5 CART

1 84.32 84.89

2 85.86 84.32

3 86.23 86.74

4 88.54 87.36

5 90.23 89.36

6 93.56 92.15

7 95.36 93.63

Parishodh Journal


ISSN NO:2347-6648

Page No:1710

Figure 4.Accuracy (%)

The accuracy of both the decision tree algorithms is demonstrated in figure 4 and table 2. The

line graph contains accuracy in Y axis. That is measured here in percentage (%). The X axis

shows the different experiments performed. According to the results the accuracy of

algorithm is varying but not 95%. Additionally the blue line (C4.5) shows the clear winner in

terms of accuracy of classification. Similarly error rate is the indication of misclassifications

of samples. It is measured on the bases of misclassified of instances and total instance to

classify.

𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒 % =𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑢𝑛𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑆𝑎𝑚𝑝𝑙𝑒𝑠

𝑡𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑡𝑜 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑦𝑋100

or

𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒(%) = 100 – Accuracy

Table 3.Error Rate


1 15.68 15.11

2 14.14 15.68

3 13.77 13.26

4 11.46 12.64

5 9.77 10.64

6 6.44 7.85

7 4.64 6.37

7880828486889092949698

1 2 3 4 5 6 7

Acc

ura

cy (

%)

Experiments

C4.5 CART

Parishodh Journal


ISSN NO:2347-6648

Page No:1711

Figure 5.Error Rate

The error rate is unsuccessfulness of a classifier, thus less error rate means goodness of

classifier. Figure 5 and table 3 shows the error rate of both algorithms where X axis includes

the observations collected in table 3 and their line graph is given in figure 5. Here Y axis

includes the error rate (%). According to the results the C4.5 repots less error as compared to

CART in our Dataset. In comparison C4.5 consumes lesser memory then CART, because

C4.5 reduces all the ambiguity from tree by pruning hence the tree size in smaller than

CART. This observation is demonstrated in figure 6 and table 4. In most of the experiments

the blue line shows low consumption as compared to CART. Thus we select the C4.5 for

further experiments.

Table 4.Memory Comparison


1 13000 13628

2 13882 13931

3 14293 13829

4 13628 13909

5 13843 14294

6 13727 15391

7 14048 14726

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5 6 7

Err

or

Ra

te (

%)

Experiments

C4.5 CART

Parishodh Journal


ISSN NO:2347-6648

Page No:1712

Figure 6.Memory Usage

The time utilization of algorithms is also an essential fact of algorithm selection. In this

context the time consumption of algorithms for increasing amount of data is measured. The

table 5 and figure 7 reports the time consumption of both the algorithms. According to the

results the C4.5 algorithm requires less amount of time as compared to the CART algorithm.

Therefore the C4.5 algorithm is selected for further experiments.

Table 5.Time Consumption


1 30 90

2 80 150

3 120 220

4 135 300

5 150 350

6 170 400

7 200 450

11500

12000

12500

13000

13500

14000

14500

15000

15500

16000

1 2 3 4 5 6 7

Me

mo

ry (

KB

)

Experiments

C4.5 CART

Parishodh Journal


ISSN NO:2347-6648

Page No:1713

Figure 7.Time Consumption

V. CONCLUSION & FUTURE WORK

The primary point of the proposed insightful work is to distinguish a powerful methodology

which serves to the understudies for getting criticism and suggesting the reasonable course

material. In this setting the proposed work is roused to plan an exact and successful

suggestion model that comprehend the understudy learning conduct, their status and

accessible course structure and it multifaceted nature. Utilizing every one of these variables

the total proposal framework is attempted to structure. Be that as it may, we have to parts first

understudy learning conduct and their exhibition expectation. The principal parts results are

accounted for here first and afterward the understudy execution forecast model is presented in

this paper.Currently the student performance prediction model usages the C4.5 and CART

algorithm for prediction but based on the experimental analysis as given in table 6.

Table 6.Mean Performance of Classifiers

S. No. Parameters C4.5 CART

1 Accuracy 89.15 88.35

2 Error Rate 10.84 11.65

3 Memory Usage 13774.42 14244

4 Time consumption 126.42 280

The experimental results given in table 6 that provides the mean performance of both the

classifiers. According to the obtained results the performance of C4.5 algorithm found

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7

Tim

e (

MS

)

Experiments

C4.5 CART

Parishodh Journal


ISSN NO:2347-6648

Page No:1714

acceptable for the experimental dataset. In near future the proposed work is extended for

implementing the course material recommendation system design.

REFERENCES

1. Mrs. B. M. Ramageri, “Data Mining Techniques and Applications”, Indian Journal of

Computer Science and Engineering Vol. 1 No. 4 301-305.

2. L. Wang, C. A. Alexander, “Machine Learning in Big Data”, International Journal of

Mathematical, Engineering and Management Sciences Vol. 1, No. 2, 52–61, 2016.

3. P. M.Kumari, S. K.A.Nabi, P.Priyanka, “Educational Data Mining and its role in

Educational Field”, International Journal of Computer Science and Information

Technologies, Vol.5(2), 2014, 2458-2461.

4. ASHISH DUTT, MAIZATUL AKMAR ISMAIL, AND TUTUT HERAWAN, “A

Systematic Review on Educational Data Mining”, VOLUME 5, 2017,2169-3536, 2017

IEEE.

5. John Dunlosky, Katherine A. Rawson, Elizabeth J. Marsh, Mitchell J. Nathan, and

Daniel T. Willingham, “Improving Students’ Learning With Effective Learning

Techniques: Promising Directions From Cognitive and Educational Psychology”,

Psychological Science in the Public Interest 14(1) 4–58© The Author(s) 2013.

6. Hassan Khosrav, Kendra M. L. Cooper, “Using Learning Analytics to Investigate

Patterns of Performance and Engagement in Large Classes”, SIGCSE ’17, March 08 -

11, 2017, Seattle, WA, USA c 2017 Copyright held by the owner/author(s). Publication

rights licensed to ACM.ISBN 978-1-4503-4698-6/17/03.

7. S. B. Kotsiantis, D. Kanellopoulos and P. E. Pintelas, “Data Preprocessing for

Supervised Leaning”, INTERNATIONAL JOURNAL OF COMPUTER SCIENCE

VOLUME 1 NUMBER 1 2006 ISSN 1306-4428.

8. AshishDutt, SaeedAghabozrgi, MaizatulAkmalBinti Ismail, and HamidrezaMahroeian,

“Clustering Algorithms Applied in Educational DataMining”, International Journal of

Information and Electronics Engineering, Vol. 5, No. 2, March 2015.

9. M. Durairaj, C. Vijitha, “Clustering Algorithms Applied in Educational DataMining”,

International Journal of Information and Electronics Engineering, Vol. 5, No. 2, March

2015.

Parishodh Journal


ISSN NO:2347-6648

Page No:1715

10. WENKE ZANG , ZEHUA WANG, DONG JIANG, AND XIYU LIU, “A Kernel-

Based Intuitionistic Fuzzy C-Means Clustering Using Improved Multi-Objective

Immune Algorithm”, Digital Object Identifier 10.1109/ACCESS.2019.2924957.

11. Zhe Zhang, Xiyu Liu , and Lin Wang, “Spectral Clustering Algorithm Based on

Improved Gaussian Kernel Function and Beetle Antennae Search with Damping

Factor”, Hindawi Computational Intelligence and NeuroscienceVolume 2020, Article

ID 1648573, 9 pages.

12. V.L. Miguéis, Ana Freitas, Paulo J.V. Garcia, André Silva, “Early segmentation of

students according to their academic performance: Apredictive modelling approach”,

Decision Support Systems 115 (2018) 36–51.

13. Arto Hellas, Petri Ihantola, Andrew Petersen, Vangel V. Ajanovski, MirelaGutica,

TimoHynninen, AnttiKnutas, JuhoLeinonen, Chris Messom, Soohyun Nam Liao,

“Predicting Academic Performance: A Systematic Literature Review”, ITiCSE ’18

Companion, July 2–4, 2018, Larnaca, Cyprus © 2018 Copyright held by the

owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-6223-

8/18/07.

14. AlirezaSalimi, RoohollahShiraniFaradonbeh, MasoudMonjezi, Christian Moormann,

“TBM performance estimation using a classification and regression tree (CART)

technique”, Received: 8 June 2016 / Accepted: 31 October 2016 Springer-Verlag Berlin

Heidelberg 2016.

15. LiliDwiYulianto, AgungTriayudi, Ira Diana Sholihati, “Implementation Educational

Data Mining For Analysis of Student Performance Prediction with Comparison of K-

Nearest Neighbor Data Mining Method and Decision Tree C4.5”, JurnalMantikVolume

4, Number 1, May 2020, pp. 441-451

16. Kassymova K. Gulzhaina, Kosherbayeva N. Aigerim, Prof. Dr.,Sangilbayev S. Ospan,

Prof. Dr., Schachl J. Hans, Nigel B. C. Cox R., Prof. Lecturer, “Stress management

techniques for students”, Advances in Social Science, Education and Humanities

Research, volume 198,Copyright © 2018.

Parishodh Journal


ISSN NO:2347-6648

Page No:1716

student performance prediction using c4.5 decision tree

Documents