a utility mining approach for building a knowledge-based recommender for educational decision...

8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

1/9

www.ijsret.org

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882

Volume 4, Issue 10, October 2015

A Utility Mining Approach for Building a Knowledge-based

Recommender for Educational Decision Support

Deborah Evelyn. S1,1Department of Computer Science and Engineering, University College of Engineering, Kanchipuram

ABSTRACTWith the boom in the number of streamlined career

options that are available today, there is a need forstrategic guidance to students enrolled into a course of

broader study. Helping them discover their domain ofexpertise will therefore result in a hassle free pathtowards a successful career. This is the key notion of this

paper – to help students connect the dots. For this we canemploy data mining technologies, which provide acollection of methodologies to discover vital patterns

and relationships among data within large data sets. Thedatabase can be based on curriculum and evaluation

process. Then the corresponding patterns would showstudents‟ expertise. By the development of a knowledge-

based recommender, this information can be conveyed to

the student community. This system has proved to besynergistic in educational decision support. Alongside

this concept, a psychological proposition called the FirstLetter Hypothesis has been put forth through research on

the databases.

Keywords – Association rules, Confidence, Knowledge

base, Recommendation, Support, Utility mining.

I. INTRODUCTION

1.1.

SIGNIFICANCE OF UTILITY MININGThe vastness and accessibility of data has indeedmotivated the formulation of various strategies to

unravel meaningful knowledge hidden in huge databasesthrough data mining. Of all the numerous mining

techniques that are available, frequent itemset patternmining and utility mining techniques have gained muchsignificance. Some of the key factors for thisdevelopment are the nature of the databases (the

transition from static types to transactional andincremental types of databases.) and the nature of theattributes and the entities contained in it.More recently, there has been a noted drift of theapplication domains that employ data mining, toward theutility mining approach, from the frequent itemset

mining approach because the latter implicitly considersthe utilities of the item sets contained to be equal andrepresents their occurrences with binary values.Secondly, in the frequent itemset mining, values of itemsets only increase with frequency. These limitations had

resulted in the development of a better strategy, i.e. the

utility mining technique.The utility mining approach has been formulated toidentify item sets of high utilities (e.g. profit margin

value, user preferences, etc.) and also to allow the usersto set utility threshold of all item sets in a database. It is

an improvised version of the frequent itemset patternmining strategy and is the most state-of-art approach thacan be adapted.

1.2.

RECOMMENDER SYSTEMSRecommender systems are a subclass of informationfiltering system that seek to predict the „rating‟ or

preference that a user would give to an item.There are four types of recommender systems as given

belowContent-based: It is an approach that focuses on thecontent, i.e. the type of file or format of information

mined in the past activities of users.Collaborative: It is an approach that works with a

predictive model trained by the logs of the past activities

of users.

Hybrid: It is a combination of the above types.Knowledge-based: This type of recommender system isthe one that truly depends on a knowledge base built bythe association rules mined in the process of data

mining.The last type of recommender system discussed above isthe best suit for this project as the type of database thatwas mined was a static database. The other types ofrecommender systems are designed to work with

transactional and incremental databases.

1.3. ROLE OF DATA MINING IN THE

RECOMMENDER SYSTEMThe association rules generated during the data mining

process was used to formulate the knowledge base of therecommender system. Associations among the attributes

of the database in-hand were generated using Apriorand PredictiveApriori algorithms. Thus the knowledge

base so obtained provided meaningful insight on thestudents‟ approach to their respective curriculum andalso proved the last letter hypothesis to be true.


2/9

www.ijsret.org



1.4.

OTHER RELATED KEYWORDS AND

DEFINITIONS

1.4.1.

DATA MINING

Data mining is the process of discovering interestingknowledge from large amount of data stored in database,data warehouse or other information repositories. Based

on this view, the architecture of a typical system has thefollowing major components.

1.4.2.

ASSOCIATION RULES, SUPPORT AND

CONFIDENCE

Association rules are used to show the relationship between data items. Mining association rules allowsfinding rules of the form: X-> Y for all X and Y in U.

Here X and Y are item sets of some data set U.Support and confidence are common methods used to

measure the quality of association rule. Support for theassociation rule X->Y is the percentage of transaction in

the database that contains XUY. Confidence for theassociation rule is X->Y is the ratio of the number oftransaction that contains XUY to the number oftransaction that contain X.

Figure 1.1: Association Rules

1.4.3.

FIRST LETTER HYPOTHESISThis hypothesis states that, individuals whose names

begin with the last ten letters of the alphabet series are

better achievers and competitors than those whosenames begin with the first ten letters of the alphabetseries.

II. THE ARCHITECTURE OF THE

EDUCATIONAL RECOMMENDER The following diagram shows a simple schematic of the

proposed architecture of this project. The project isdivided into three layers for implementation ease. The

Application layer deals purely with the creation and preprocessing of the databases. The data mining layerconsists of the set of activities that are aimed at efficient

extraction of association rules (meaningful patterns ofthis project) and the formulation of the knowledge basewith the most realistic association rules that were minedduring this process. Special focus has been given to theformulation of the knowledge base and its details are

clearly explained in the following texts of this chapterand the association rules generated through the mining

process are discussed in chapter 4.

Figure 2.1: Educational Recommender Architecture

2.1. APPLICATION LAYER

The database containing the student information is the

application database for this project. Since no suchdatabase was pre-existent, it was created to through

questionnaires. The following steps were involved in this process.

2.1.1. DATA COLLECTION AND

PREPROCESSING

The databases were created with the data submitted bythe students of UCEK through a prudently completedquestionnaire. It consisted of the following sections to be

completed against every mainstream subjects in thecurriculum for Computer Science and Engineering a

levels of under graduation study.Understanding (rating range: 1-3)

Marks Scored (rating range: 1-3) (1-E and below),( 2-C,D) ,(3-B and above)Confidence (rating range: 1-3)

First Attempt (P, PF, F) (P-cleared the finals, PF- clearedthe finals but not in the first attempt, F- still a backlog)The details furnished were then preprocessed for

efficient and compatible association mining bysmoothing and transformation. The following

transformations were carried out in order to make thedatabase compatible with the mining tool.The range 1-3 for understanding was transformed into

Nominal (1), Sound (2) and Profound (3). The range 1-3for marks scored was transformed into Low (1), Average

(2) and Good (3). The range 1-3 for confidence ratingwas transformed into Doubtful (1), Secure (2) and

Confident (3). No transformation was required on thefirst attempt column. The unsupervised transformationfunction NumericToNominal was applied to numeric

attributes.The missing values were found on elective subjects these

were smoothened by applying the minimum thresholdvalue of the entity in the respective column. Thissmoothing was necessary as the data mining tool used

cannot handle null values.


3/9

www.ijsret.org



2.1.2. PROCEDURE FOR THE CONSTRUCTION

OF THE APPLICATION LAYER

1. Collect the required data by administering surveyquestionnaires or by employing other survey methods tothe population of interest (in this project, studentcommunity).

2. Apply data cleaning strategies for efficiency duringthe mining process.3. Apply the required data transformation strategies forcompatibility of attribute types.4. Create the database in a format compatible with the

data mining tool that is to be used.

2.2. DATA MINING LAYER

This step involved the application of associationalgorithms for the formulation of association rules.

Hence, the next phase of the project was to mine the preprocessed databases. Attempts to cluster the database

through the tool had failed due to the large number ofattributes considered and the exceeding size of thedatabase on the whole. The subjects were clusteredmanually into the following categories.

Table 2.1: Categories and Subjects

SL.

NO

Categories And Subjects

1 Application Programming

Fundamentals Of Computing And Programming

Object Oriented Programming

Java Programming Paradigms

2 Critical Programming

Fundamentals Of Computing And ProgrammingData Structures

Design Analysis And Algorithms

3 Hardware Logic

Electric Circuits And Electron Devices

Digital Principles Of System Design

Microprocessors And Microcomputers

Computer Organization And Architecture

Advanced Computer Architecture

4 System Theory

Operating System

System Software

5 Machine Learning

Artificial IntelligenceTheory Of Computations

Principles Of Compiler Design

6 Network Study

Computer Networks

Web Technology

7 Software Engineering

Software Engineering

Object Oriented Analysis And Design

8 Database Techniques

Database Management Systems

Advanced Database Technology

Each cluster was mined independently using both

Apriori and PredictiveApriori algorithms and theinferences of these procedures were compared foranalysis and to find out which was a more realisticapproach of the two; these inferences are discussed inchapter 6.

2.2.1. ASSOCIATION RULE MINING

ALGORITHMSApriori algorithm is a frequent itemset mining strategywhich learns as it operates over a transactional database

An item that is frequently encountered during the mining process has a greater support value. Thus it‟s analgorithm that highlights the general trend in a particular

dataset.The pseudo code for the algorithm is given below for a

transaction database , and a support threshold of

Usual set theoretic notation is employed; though note

that is a multiset. Ck is the candidate set for level . Aeach step, the algorithm is assumed to generate the

candidate sets from the large item sets of the preceding

level, heeding the downward closure lemma. Count[c]

accesses a field of the data structure that represents

candidate set c, which is initially assumed to be zero

Many details are omitted below, usually the most

important part of the implementation is the data structure

used for storing the candidate sets, and counting their

frequencies.

Apriori, while historically significant, suffers from anumber of inefficiencies or trade-offs, which havespawned other algorithms. Candidate generationgenerates large numbers of subsets (the algorithm

attempts to load up the candidate set with as many as possible before each scan). Bottom-up subseexploration (essentially a breadth-first traversal of thesubset lattice) finds any maximal subset S only after all

of its proper subsets


4/9

www.ijsret.org



PredictiveApriori algorithm overcomes this by

considering a threshold that various along the processand by continually predicting the course of the followingsteps or associations.Thus the knowledge base of the recommender systemcan be formed of the most realistic association rules

formulated from both algorithms. This knowledge basewill direct the recommender service it would providethrough the front end to the user. The followingschematic shows the architecture of the recommender.

2.2.2. PROCEDURE FOR THE CONSTRUCTION

OF THE KNOWLEDGE BASE

1. Load the database into the data mining environment.

2. Apply the necessary filters and transforms forattribute type compatibility. This step is necessary

because both Apriori and PredictiveApriori algorithmscannot handle varying data types.

3. Compare association rules generated.4. Evaluate the same based on accuracy rate and supportcount.5. Select realistic associations to build the knowledge

base.

2.3. RECOMMENDATION LAYER

The knowledge base programmed into the recommendersystem‟s source code is the cornerstone for therecommender service. It is the recommender algorithm.

In this project the knowledge base formulated in the previous stage of the project is programmed into it. In a

knowledge based recommender the interfacingapplications software is not required.

III. IMPLEMENTATION DETAILSThis chapter provides a brief look into how the

architecture was implemented using the variouscomponents mentioned in chapter 4.

3.1. DATABASES FOR RESEARCH

The databases where created as per the procedure found in chapter 3. Initially the data from the

flat files were fed into MS Excel and saved in the

comma separated version. The various anomalies werecorrected as mentioned in chapter 3. Each category had a

corresponding database. The instances in all thedatabases were equal (200) and each database had

varying number of attributes corresponding to thesubjects dealt under it.

3.2. ASSOCIATION RULE MINING WITH WEKA

The association rules were generated using

the algorithms, Apriori and PredictiveApriori. Each log

was executed with a 10 cycles of cross validation and

was set to generate the 10 best association rules as a

result of the process. The comparison based on the

performance evaluation for the two algorithms are

discussed in the following chapter.

Figure 3.3: All attributes of application programming

after preprocessing

3.3. RECOMMENDATION USER INTERFACE

The front end was developed in Java programminglanguage in the NetBeans IDE 6.9.1. The swingcomponents and their associated event handlingmechanisms were implemented. Each category of the

core stream subjects were created to be introduced andexplained about practically in separate frames. Eachcategory frame was made to display the subjects underit, its application and the scope or the job titles that itinvolved.

3.4. FLOW DIAGRAM OF IMPLEMENTATION

METHODOLOGY

Figure 3.4: Flow diagram of Implementation

Methodology

Thus the various implementation strategies are

explained. The results are discussed in the following

chapter.


5/9

www.ijsret.org



IV.

RESULTS AND DISCUSSIONS

4.1. ASSOCIATION RULE MINING

The results of the Association rules mining processes yielded two types of inferences. Theapplication specific inferences regarding the patterns andrules that were generated from application databases

and, the domain specific inferences i.e. technicainferences on the comparison between Apriori andPredictiveApriori association rule mining algorithmsThe following association rules were generated usingApriori and PredictiveApriori algorithms.

Sl.

No

Association Rules Algorithm

Used

1.1.

Application Programming

Apriori

1.2.

Application Programming

Predictive

Apriori

2.1.

Critical Programming

Apriori

2.2.

Critical Programming

Predictive

Apriori


6/9

www.ijsret.org



3.1.

Hardware Logic

Apriori

3.2.

Hardware Logic

Predictive

Apriori

4.1.

System Theory

Apriori

4.2.

System Theory

Predictive

Apriori

5.1.

Machine Learning

Apriori


7/9

www.ijsret.org



5.2. PredictiveApriori

6.1.

Network Study

Apriori

6.2.

Network Study

Predictive

Apriori

7.1.


Apriori

7.2.


Predictive

Apriori


8/9

www.ijsret.org



8.1.

Database Techniques

Apriori

8.2.

Database Techniques

PredictiveApriori

Table 4.1. Tabulation of association rules

4.2. COMPARISON BASED ON SUPPORT AND

CONFIDENCEThe following table shows the Support and Confidencevalues of the 10 best association rules generated by the

two algorithms that are discussed. All the results ofApriori algorithm show that the support value of the

association increases gradually with time. This clearlyshows that Apriori algorithm is a frequent itemset

mining strategy. The relationships are established among

attributes with varying support count.Conversely, in the PredictiveApriori strategyrelationships are built among attributes with the same

support count first.The following graph shows the difference in suppor

count along execution in the two algorithms

Figure 4.1: Comparison of Support Count Figure 4.2: Comparison of Confidence

In the following graph the comparison in the confidenceor accuracy for the same associations rules have beendrawn. This graph shows that PredictiveApriorialgorithm has a higher accuracy level than Apriori

algorithm. The fall in accuracy rate of PredictiveApriorialgorithm is relatively very small in comparison with the

other. The margin of slope of the PredictiveApriorialgorithm is smaller. Higher accuracy correlates tohigher utility. The following graph shows the accuracy

plot of the two algorithms discussed.


9/9

www.ijsret.org



4.3. RESULTANT GRAPHS OF FIRST LETTER

HYPOTHESIS

The following graph shows the plot of performance inthe finals against the first letters of candidates‟ names.

Figure 4.3: First Letter Hypothesis Plot

From the graph it is clear that the statement of first letterhypothesis of true. The following chapter holds the

inference of the same.

V. CONCLUSION AND FUTURE WORK5.1. CONCLUSIONThus the implementation of a recommender system wascompleted successfully and the comparison drawn

between the two algorithms infer that PredictiveApriori performs better based on the predictive accuracy and the

various statistical measures that were considered. Thefollowing inferences were observed in this educationalresearch. Candidates who have a good understanding on

the basics do well in the successive levels. A securelevel of understanding and Confidence produce a greater

possibility for success in finals. Basic course titles arevery easy to succeed in finals. A good understandingmay not always lead to an equivalent level of success.

Two dimensional graphs that were generated as agraphical result of the association rule mining process

showed clearly that the psychological relationship between performance or competitiveness and the firstletter of a candidate‟s name, as defined by the first letterhypothesis was true. This is because the individuals withnames beginning with the first ten letters of the alphabet

series are always first in line, in sorting and hence don‟thave the urge to fight in order to move ahead; the

converse holds true for the individuals with their names

beginning with the last ten letters of the alphabet series.

REFERENCESJournal Papers:

[1] Sunita B Aher, Lobo. L. M. R. J.(2012), “Data

Preparation Strategy in E-Learning System using Association Rule Mining Algorithm”, Internationa

Journal of Computer Applications, Volume 41-pages 35-

40.

[2] Sunita B Aher, Lobo. L. M. R. J.(2012), “A

Comparative Study for Selecting the Best Unsupervised

learning Algorithm in E- learning Systems”

International Journal of Computer Applications, Volume

41-pages 27-34.

[3] Sunita B Aher, Lobo. L. M. R. J.(2011), “ Data

Mining in Educational System in WEKA”, Internationa

Journal of Computer Applications, Internationa

Conference on Emerging Technology Trends.

[4] Sunita B Aher, Lobo. L. M. R. J.(2011), “ A

Framework for Recommendation of courses in E-

learning System”, International Journal of Computer

Applications, Volume 35-pages 21-28.

[5] Sunita B Aher, Lobo. L. M. R. J.(2012), “ A

Comparative Study of Association Rule Algorithms for

Course Recommender System in E-learning ”

International Journal of Computer Applications, Volume

39-pages 48-52.[6] Mukesh Sharma, Jyothi Choudhary, Gunjan Sharma

(2013), “ Evaluating the performance of apriori and

predictive apriori algorithm to find new association

rules based on the statistical measures of datasets”

International Journal of Engineering Research and

Technology, Volume 6.

[7] Shwetha, Kanwal Garg (2013), “ Mining Efficien

Association Rules Through Apriori Algorithm Using

Attributes and Comparative Analysis of Various

Association Rule Algorithms”, International Journal o

Advanced Research in Computer Science and Software

Engineering, Volume 3.

Web Source:

[8] Wikipedia – Apriori algorithm

https://en.wikipedia.org/wiki/Apriori_algorithm

a utility mining approach for building a knowledge-based recommender for educational decision...

Documents