a utility mining approach for building a knowledge-based recommender for educational decision...
TRANSCRIPT
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
1/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
A Utility Mining Approach for Building a Knowledge-based
Recommender for Educational Decision Support
Deborah Evelyn. S1,1Department of Computer Science and Engineering, University College of Engineering, Kanchipuram
ABSTRACTWith the boom in the number of streamlined career
options that are available today, there is a need forstrategic guidance to students enrolled into a course of
broader study. Helping them discover their domain ofexpertise will therefore result in a hassle free pathtowards a successful career. This is the key notion of this
paper – to help students connect the dots. For this we canemploy data mining technologies, which provide acollection of methodologies to discover vital patterns
and relationships among data within large data sets. Thedatabase can be based on curriculum and evaluation
process. Then the corresponding patterns would showstudents‟ expertise. By the development of a knowledge-
based recommender, this information can be conveyed to
the student community. This system has proved to besynergistic in educational decision support. Alongside
this concept, a psychological proposition called the FirstLetter Hypothesis has been put forth through research on
the databases.
Keywords – Association rules, Confidence, Knowledge
base, Recommendation, Support, Utility mining.
I. INTRODUCTION
1.1.
SIGNIFICANCE OF UTILITY MININGThe vastness and accessibility of data has indeedmotivated the formulation of various strategies to
unravel meaningful knowledge hidden in huge databasesthrough data mining. Of all the numerous mining
techniques that are available, frequent itemset patternmining and utility mining techniques have gained muchsignificance. Some of the key factors for thisdevelopment are the nature of the databases (the
transition from static types to transactional andincremental types of databases.) and the nature of theattributes and the entities contained in it.More recently, there has been a noted drift of theapplication domains that employ data mining, toward theutility mining approach, from the frequent itemset
mining approach because the latter implicitly considersthe utilities of the item sets contained to be equal andrepresents their occurrences with binary values.Secondly, in the frequent itemset mining, values of itemsets only increase with frequency. These limitations had
resulted in the development of a better strategy, i.e. the
utility mining technique.The utility mining approach has been formulated toidentify item sets of high utilities (e.g. profit margin
value, user preferences, etc.) and also to allow the usersto set utility threshold of all item sets in a database. It is
an improvised version of the frequent itemset patternmining strategy and is the most state-of-art approach thacan be adapted.
1.2.
RECOMMENDER SYSTEMSRecommender systems are a subclass of informationfiltering system that seek to predict the „rating‟ or
preference that a user would give to an item.There are four types of recommender systems as given
belowContent-based: It is an approach that focuses on thecontent, i.e. the type of file or format of information
mined in the past activities of users.Collaborative: It is an approach that works with a
predictive model trained by the logs of the past activities
of users.
Hybrid: It is a combination of the above types.Knowledge-based: This type of recommender system isthe one that truly depends on a knowledge base built bythe association rules mined in the process of data
mining.The last type of recommender system discussed above isthe best suit for this project as the type of database thatwas mined was a static database. The other types ofrecommender systems are designed to work with
transactional and incremental databases.
1.3. ROLE OF DATA MINING IN THE
RECOMMENDER SYSTEMThe association rules generated during the data mining
process was used to formulate the knowledge base of therecommender system. Associations among the attributes
of the database in-hand were generated using Apriorand PredictiveApriori algorithms. Thus the knowledge
base so obtained provided meaningful insight on thestudents‟ approach to their respective curriculum andalso proved the last letter hypothesis to be true.
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
2/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
1.4.
OTHER RELATED KEYWORDS AND
DEFINITIONS
1.4.1.
DATA MINING
Data mining is the process of discovering interestingknowledge from large amount of data stored in database,data warehouse or other information repositories. Based
on this view, the architecture of a typical system has thefollowing major components.
1.4.2.
ASSOCIATION RULES, SUPPORT AND
CONFIDENCE
Association rules are used to show the relationship between data items. Mining association rules allowsfinding rules of the form: X-> Y for all X and Y in U.
Here X and Y are item sets of some data set U.Support and confidence are common methods used to
measure the quality of association rule. Support for theassociation rule X->Y is the percentage of transaction in
the database that contains XUY. Confidence for theassociation rule is X->Y is the ratio of the number oftransaction that contains XUY to the number oftransaction that contain X.
Figure 1.1: Association Rules
1.4.3.
FIRST LETTER HYPOTHESISThis hypothesis states that, individuals whose names
begin with the last ten letters of the alphabet series are
better achievers and competitors than those whosenames begin with the first ten letters of the alphabetseries.
II. THE ARCHITECTURE OF THE
EDUCATIONAL RECOMMENDER The following diagram shows a simple schematic of the
proposed architecture of this project. The project isdivided into three layers for implementation ease. The
Application layer deals purely with the creation and preprocessing of the databases. The data mining layerconsists of the set of activities that are aimed at efficient
extraction of association rules (meaningful patterns ofthis project) and the formulation of the knowledge basewith the most realistic association rules that were minedduring this process. Special focus has been given to theformulation of the knowledge base and its details are
clearly explained in the following texts of this chapterand the association rules generated through the mining
process are discussed in chapter 4.
Figure 2.1: Educational Recommender Architecture
2.1. APPLICATION LAYER
The database containing the student information is the
application database for this project. Since no suchdatabase was pre-existent, it was created to through
questionnaires. The following steps were involved in this process.
2.1.1. DATA COLLECTION AND
PREPROCESSING
The databases were created with the data submitted bythe students of UCEK through a prudently completedquestionnaire. It consisted of the following sections to be
completed against every mainstream subjects in thecurriculum for Computer Science and Engineering a
levels of under graduation study.Understanding (rating range: 1-3)
Marks Scored (rating range: 1-3) (1-E and below),( 2-C,D) ,(3-B and above)Confidence (rating range: 1-3)
First Attempt (P, PF, F) (P-cleared the finals, PF- clearedthe finals but not in the first attempt, F- still a backlog)The details furnished were then preprocessed for
efficient and compatible association mining bysmoothing and transformation. The following
transformations were carried out in order to make thedatabase compatible with the mining tool.The range 1-3 for understanding was transformed into
Nominal (1), Sound (2) and Profound (3). The range 1-3for marks scored was transformed into Low (1), Average
(2) and Good (3). The range 1-3 for confidence ratingwas transformed into Doubtful (1), Secure (2) and
Confident (3). No transformation was required on thefirst attempt column. The unsupervised transformationfunction NumericToNominal was applied to numeric
attributes.The missing values were found on elective subjects these
were smoothened by applying the minimum thresholdvalue of the entity in the respective column. Thissmoothing was necessary as the data mining tool used
cannot handle null values.
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
3/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
2.1.2. PROCEDURE FOR THE CONSTRUCTION
OF THE APPLICATION LAYER
1. Collect the required data by administering surveyquestionnaires or by employing other survey methods tothe population of interest (in this project, studentcommunity).
2. Apply data cleaning strategies for efficiency duringthe mining process.3. Apply the required data transformation strategies forcompatibility of attribute types.4. Create the database in a format compatible with the
data mining tool that is to be used.
2.2. DATA MINING LAYER
This step involved the application of associationalgorithms for the formulation of association rules.
Hence, the next phase of the project was to mine the preprocessed databases. Attempts to cluster the database
through the tool had failed due to the large number ofattributes considered and the exceeding size of thedatabase on the whole. The subjects were clusteredmanually into the following categories.
Table 2.1: Categories and Subjects
SL.
NO
Categories And Subjects
1 Application Programming
Fundamentals Of Computing And Programming
Object Oriented Programming
Java Programming Paradigms
2 Critical Programming
Fundamentals Of Computing And ProgrammingData Structures
Design Analysis And Algorithms
3 Hardware Logic
Electric Circuits And Electron Devices
Digital Principles Of System Design
Microprocessors And Microcomputers
Computer Organization And Architecture
Advanced Computer Architecture
4 System Theory
Operating System
System Software
5 Machine Learning
Artificial IntelligenceTheory Of Computations
Principles Of Compiler Design
6 Network Study
Computer Networks
Web Technology
7 Software Engineering
Software Engineering
Object Oriented Analysis And Design
8 Database Techniques
Database Management Systems
Advanced Database Technology
Each cluster was mined independently using both
Apriori and PredictiveApriori algorithms and theinferences of these procedures were compared foranalysis and to find out which was a more realisticapproach of the two; these inferences are discussed inchapter 6.
2.2.1. ASSOCIATION RULE MINING
ALGORITHMSApriori algorithm is a frequent itemset mining strategywhich learns as it operates over a transactional database
An item that is frequently encountered during the mining process has a greater support value. Thus it‟s analgorithm that highlights the general trend in a particular
dataset.The pseudo code for the algorithm is given below for a
transaction database , and a support threshold of
Usual set theoretic notation is employed; though note
that is a multiset. Ck is the candidate set for level . Aeach step, the algorithm is assumed to generate the
candidate sets from the large item sets of the preceding
level, heeding the downward closure lemma. Count[c]
accesses a field of the data structure that represents
candidate set c, which is initially assumed to be zero
Many details are omitted below, usually the most
important part of the implementation is the data structure
used for storing the candidate sets, and counting their
frequencies.
Apriori, while historically significant, suffers from anumber of inefficiencies or trade-offs, which havespawned other algorithms. Candidate generationgenerates large numbers of subsets (the algorithm
attempts to load up the candidate set with as many as possible before each scan). Bottom-up subseexploration (essentially a breadth-first traversal of thesubset lattice) finds any maximal subset S only after all
of its proper subsets
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
4/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
PredictiveApriori algorithm overcomes this by
considering a threshold that various along the processand by continually predicting the course of the followingsteps or associations.Thus the knowledge base of the recommender systemcan be formed of the most realistic association rules
formulated from both algorithms. This knowledge basewill direct the recommender service it would providethrough the front end to the user. The followingschematic shows the architecture of the recommender.
2.2.2. PROCEDURE FOR THE CONSTRUCTION
OF THE KNOWLEDGE BASE
1. Load the database into the data mining environment.
2. Apply the necessary filters and transforms forattribute type compatibility. This step is necessary
because both Apriori and PredictiveApriori algorithmscannot handle varying data types.
3. Compare association rules generated.4. Evaluate the same based on accuracy rate and supportcount.5. Select realistic associations to build the knowledge
base.
2.3. RECOMMENDATION LAYER
The knowledge base programmed into the recommendersystem‟s source code is the cornerstone for therecommender service. It is the recommender algorithm.
In this project the knowledge base formulated in the previous stage of the project is programmed into it. In a
knowledge based recommender the interfacingapplications software is not required.
III. IMPLEMENTATION DETAILSThis chapter provides a brief look into how the
architecture was implemented using the variouscomponents mentioned in chapter 4.
3.1. DATABASES FOR RESEARCH
The databases where created as per the procedure found in chapter 3. Initially the data from the
flat files were fed into MS Excel and saved in the
comma separated version. The various anomalies werecorrected as mentioned in chapter 3. Each category had a
corresponding database. The instances in all thedatabases were equal (200) and each database had
varying number of attributes corresponding to thesubjects dealt under it.
3.2. ASSOCIATION RULE MINING WITH WEKA
The association rules were generated using
the algorithms, Apriori and PredictiveApriori. Each log
was executed with a 10 cycles of cross validation and
was set to generate the 10 best association rules as a
result of the process. The comparison based on the
performance evaluation for the two algorithms are
discussed in the following chapter.
Figure 3.3: All attributes of application programming
after preprocessing
3.3. RECOMMENDATION USER INTERFACE
The front end was developed in Java programminglanguage in the NetBeans IDE 6.9.1. The swingcomponents and their associated event handlingmechanisms were implemented. Each category of the
core stream subjects were created to be introduced andexplained about practically in separate frames. Eachcategory frame was made to display the subjects underit, its application and the scope or the job titles that itinvolved.
3.4. FLOW DIAGRAM OF IMPLEMENTATION
METHODOLOGY
Figure 3.4: Flow diagram of Implementation
Methodology
Thus the various implementation strategies are
explained. The results are discussed in the following
chapter.
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
5/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
IV.
RESULTS AND DISCUSSIONS
4.1. ASSOCIATION RULE MINING
The results of the Association rules mining processes yielded two types of inferences. Theapplication specific inferences regarding the patterns andrules that were generated from application databases
and, the domain specific inferences i.e. technicainferences on the comparison between Apriori andPredictiveApriori association rule mining algorithmsThe following association rules were generated usingApriori and PredictiveApriori algorithms.
Sl.
No
Association Rules Algorithm
Used
1.1.
Application Programming
Apriori
1.2.
Application Programming
Predictive
Apriori
2.1.
Critical Programming
Apriori
2.2.
Critical Programming
Predictive
Apriori
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
6/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
3.1.
Hardware Logic
Apriori
3.2.
Hardware Logic
Predictive
Apriori
4.1.
System Theory
Apriori
4.2.
System Theory
Predictive
Apriori
5.1.
Machine Learning
Apriori
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
7/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
5.2. PredictiveApriori
6.1.
Network Study
Apriori
6.2.
Network Study
Predictive
Apriori
7.1.
Software Engineering
Apriori
7.2.
Software Engineering
Predictive
Apriori
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
8/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
8.1.
Database Techniques
Apriori
8.2.
Database Techniques
PredictiveApriori
Table 4.1. Tabulation of association rules
4.2. COMPARISON BASED ON SUPPORT AND
CONFIDENCEThe following table shows the Support and Confidencevalues of the 10 best association rules generated by the
two algorithms that are discussed. All the results ofApriori algorithm show that the support value of the
association increases gradually with time. This clearlyshows that Apriori algorithm is a frequent itemset
mining strategy. The relationships are established among
attributes with varying support count.Conversely, in the PredictiveApriori strategyrelationships are built among attributes with the same
support count first.The following graph shows the difference in suppor
count along execution in the two algorithms
Figure 4.1: Comparison of Support Count Figure 4.2: Comparison of Confidence
In the following graph the comparison in the confidenceor accuracy for the same associations rules have beendrawn. This graph shows that PredictiveApriorialgorithm has a higher accuracy level than Apriori
algorithm. The fall in accuracy rate of PredictiveApriorialgorithm is relatively very small in comparison with the
other. The margin of slope of the PredictiveApriorialgorithm is smaller. Higher accuracy correlates tohigher utility. The following graph shows the accuracy
plot of the two algorithms discussed.
-
8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support
9/9
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 10, October 2015
4.3. RESULTANT GRAPHS OF FIRST LETTER
HYPOTHESIS
The following graph shows the plot of performance inthe finals against the first letters of candidates‟ names.
Figure 4.3: First Letter Hypothesis Plot
From the graph it is clear that the statement of first letterhypothesis of true. The following chapter holds the
inference of the same.
V. CONCLUSION AND FUTURE WORK5.1. CONCLUSIONThus the implementation of a recommender system wascompleted successfully and the comparison drawn
between the two algorithms infer that PredictiveApriori performs better based on the predictive accuracy and the
various statistical measures that were considered. Thefollowing inferences were observed in this educationalresearch. Candidates who have a good understanding on
the basics do well in the successive levels. A securelevel of understanding and Confidence produce a greater
possibility for success in finals. Basic course titles arevery easy to succeed in finals. A good understandingmay not always lead to an equivalent level of success.
Two dimensional graphs that were generated as agraphical result of the association rule mining process
showed clearly that the psychological relationship between performance or competitiveness and the firstletter of a candidate‟s name, as defined by the first letterhypothesis was true. This is because the individuals withnames beginning with the first ten letters of the alphabet
series are always first in line, in sorting and hence don‟thave the urge to fight in order to move ahead; the
converse holds true for the individuals with their names
beginning with the last ten letters of the alphabet series.
REFERENCESJournal Papers:
[1] Sunita B Aher, Lobo. L. M. R. J.(2012), “Data
Preparation Strategy in E-Learning System using Association Rule Mining Algorithm”, Internationa
Journal of Computer Applications, Volume 41-pages 35-
40.
[2] Sunita B Aher, Lobo. L. M. R. J.(2012), “A
Comparative Study for Selecting the Best Unsupervised
learning Algorithm in E- learning Systems”
International Journal of Computer Applications, Volume
41-pages 27-34.
[3] Sunita B Aher, Lobo. L. M. R. J.(2011), “ Data
Mining in Educational System in WEKA”, Internationa
Journal of Computer Applications, Internationa
Conference on Emerging Technology Trends.
[4] Sunita B Aher, Lobo. L. M. R. J.(2011), “ A
Framework for Recommendation of courses in E-
learning System”, International Journal of Computer
Applications, Volume 35-pages 21-28.
[5] Sunita B Aher, Lobo. L. M. R. J.(2012), “ A
Comparative Study of Association Rule Algorithms for
Course Recommender System in E-learning ”
International Journal of Computer Applications, Volume
39-pages 48-52.[6] Mukesh Sharma, Jyothi Choudhary, Gunjan Sharma
(2013), “ Evaluating the performance of apriori and
predictive apriori algorithm to find new association
rules based on the statistical measures of datasets”
International Journal of Engineering Research and
Technology, Volume 6.
[7] Shwetha, Kanwal Garg (2013), “ Mining Efficien
Association Rules Through Apriori Algorithm Using
Attributes and Comparative Analysis of Various
Association Rule Algorithms”, International Journal o
Advanced Research in Computer Science and Software
Engineering, Volume 3.
Web Source:
[8] Wikipedia – Apriori algorithm
https://en.wikipedia.org/wiki/Apriori_algorithm