presentation exclusive raharja ubl attahiriyah
TRANSCRIPT
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
COMPARISON C4.5, NEURAL NETWORK AND NAÏVE BAYES ALGORITHM FOR TIMELY
PREDICTION OF GRADUATION
Presented in International Conference Paper Computer Science and Information Technology
(CSIT-2013) JUNE 2013
By:
Asep Saefulloh Himawan
Arisantoso Moedjiono Nazori AZ
Senin 17 April 2023 1
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
INTRODUCTION
Senin 17 April 2023 2
Prediction graduation timely is done currently only based forecaster of the data GPA (grade point average) and the IMK (Cumulative Quality Index) previous semester
Predictionis Similiar
Classifi cation
Estimation
only prediction is used to predict specific values that will occur in the
Future
Meanwhile, universities Raharja have a dataset AO (Attendance Online) and SIS (Student Information Services), which is not fully utilized. So far, there is a presumption of the forecaster university that the to predict the graduation rate exact time simply by looking at the data and the IMK previous GPA.
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
INTRODUCTION
Senin 17 April 2023 3
From the problems
We conducted this study Which
To Conduct Classification data mining the dataset AO and SIS
Is already stored in the database DMQ to obtain predictions timely graduation.
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
INTRODUCTION
Senin 17 April 2023 4
In this study to predict of graduation exact time, will be
done the comparison on three classification algorithms
data mining that is :
1. C4.5,
2. Naive Bayes
3. and Neural Network.
Data from DMQ which have been cleaned will be processed by
using tools Weka, examination of classification model of data
mining in this research applies cross validation, confusion matrix,
and curve ROC (Receiver Operating Characteristic).
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
PROBLEM
Senin 17 April 2023 5
Problem formula is :
Is algorithm C45, Naive Bayes and Neural Network be algorithms which can be applied in determining the prediction of graduation timely?
Best which algorithm in determining prediction of graduation timely ?
From chosen algorithm does can present result of data forecast of classification of datamining by presenting graduation timely ?
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
RESEARCH METHODS
Senin 17 April 2023 6
The study was designed using a model CRISP-DM (Cross Industy Standard Process for Data Mining), in this method there are 6 stages [7]:Research use Weka (Weikato Environment Knowledge and Analysis) tools 3.6.4 version, is one of the tools for data mining base on open source software (GPL) and using java engine.
Business/Research Understanding PhaseData obtained from secondary data from a database DMQ stored on a server Higher Education Prog.
Data Understanding Phase (Fase Pemahaman Data)Database DMQ as 5842. Processing performed on the data that is used by 7 attributes or variables used in the prediction of graduation timely is: Nim, Student Name, Study of Education, Department, GPA, IMK and Prediction. of 7 attributes 2, Predictor namely GPA and IMK and 1 attributes goal to graduate on time.
Data Preparation PhaseAfter performing a query against the database DMQ obtained 891 records that will be processed by Weka.
Modeling Phase In this study, using three algorithms are algorithms C4.5, Naive Bayes and Neural Network.
Evaluation PhaseEvaluation and validation is performed by using Confusion Matrix and the ROC curve (Receiver Operating Characteristic).
Deployment Phase At this stage rule applied to the model or the most accurate in predicted graduation on time and can then be used to evaluate new data.
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
DISCUSSION
Senin 17 April 2023 7
This study aims to compare the accuracy of the resulting by engineering or data mining models namely algorithm C4.5, Naive Bayes, and Neural Network in making predictions for timely graduation. Algoritma C4.5/J48Steps to make the algorithm using data C4.5 totaling 891 training data, namely:a. Prepare training datab. Calculate the value of entropyc. Furthermore calculate the gain for each attribute and a select gain value
the highest. For example, for the attribute GPA will get Gain
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
Senin 17 April 2023 8
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
Senin 17 April 2023 9
Of the value of entropy and the gain obtained by Table 1, we then determine the next node, that node 1.1, and the calculation of entropy and the gain of each attribute of the GPA.
From Figure 2 dec is ion t ree above d iscovered ru les ru le as fo l lows:a . GPA is> = 3 .7 THEN Graduat ing on t imeb. GPA is> = 2 .7 THEN Graduat ing on t imec . GPA is> = 2 .0 THEN Graduat ing on t imed. GPA is <= 1 .99 THEN Graduat ion is not t imely
Figure 2. Decision Tree Classifier Trees J48
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
Algorithm Naive Bayes
Senin 17 April 2023 10
Method Naive Bayes using training data record number of 891 as the C4.5 methods
In the training data contained 891 records with 729 cases of graduating on time, and 162 cases did not graduate on time, to determine the prior probability using the formula :
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
Senin 17 April 2023 11
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
Algorithm Neural Network
Senin 17 April 2023 12
Neural network using back propagation algorithm in 6 (six) of the lesson is to compute or initialize the value of initial weight between -0.1 to 1.0 for the input layer, hidden layer and the bias or threshold. These are generated from neural net training data using the tools Weka multilayerperceptron.
Figure 3. Neural Net The resulting MLP
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
EVALUATION AND VALIDATION
Senin 17 April 2023 13
Comparison of test results of the three algorithms as shown in Table 3 are found the highest accuracy values obtained Neural Network and C4.5 Algorithm and lows that followed Naive Bayes, measurenment that get to be used for precision, recall dan accuracy.
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
ROC Curve
Senin 17 April 2023 14
In each test the Weka basically will instantly appear values ROC (Receveir Operating Characteristic).
Figure 4. Plot for AUC on Algorithm C4.5 with Class LTW
Value Area Under the Curve (AUC) is 1 for the calculation of class the value graduated on time in the algorithm C4.5. As for the Neural Network value or Area Under the ROC curve Curve (AUC) is a class 1 for the calculation of the value of Pass Not the Right Time. Area Under Curve (AUC) using formula below
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
ANALYSIS AND COMPARATIVE
Senin 17 April 2023 15
Of the three models, it can be seen that the value of accuracy, precision, sensitivity, recal, and the highest AUC values obtained in testing the model C4.5 and Neral Network with a balanced outcome and final Naive Bayes models as shown in Table 5 below:
For classification data mining, values AUC can be divided into several groups a. 0.90-1.00 = classification very goodb.0.80-0.90 = classification goodc. 0.70-0.80 = classification is quited. 0.60-0.70 = classification poore. 0.50-0.60 = classification false
can be concluded that the method C4.5, naïve bayes, and neural network is classified as very good as it has Area Under Curve (AUC) values between 0.90-1.00.
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
Senin 17 April 2023 16
Figure 5. The Application Of Classification of Prediction of Graduation Timely with Engine Java
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
CONCLUSION
Senin 17 April 2023 17
1. That algorithm C4.5, Naive Bayes, and Neural Network are algorithms
that can be used in determining prediction graduation time.
2. Best algorithm is the algorithm of the highest level of accuracy in the
classification model, namely C4.5 and Neural Network with rate
accuracy 100% while Naive Bayes 99.8878%. The third algorithm is
classified as very good value AUC (Area Under the Curve) between
0.90-1.00 so it can be used for predictive applications.
3. From the algorithm selected to show NIM, Student Name, GPA, IMK,
Prediction graduation timely is the result of classification datamining
using java engine.
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
REFERENCES
Senin 17 April 2023 18
• START
• Introduction
• Problem
• Methods
• Disccusion
• Evaluation & Validation• Conclusion
• References
• END
THANK YOU FOR ATTENTION
Monday, April 17, 2023 19Senin 17 April 2023 19