sowmya review
TRANSCRIPT
-
7/29/2019 Sowmya Review
1/24
A New Two-Phase Sampling Algorithm for
Discovering Association Rules
-
7/29/2019 Sowmya Review
2/24
Data mining techniques have been widely used in various applications. Data mining extract novel and
useful knowledge from large repositories of data and has become an effective analysis and decision
means in corporation. The sharing of data for data mining can bring a lot of advantages for research
and business collaboration. Data mining is becoming an increasingly important tool to transform the
data into information. The volume of electronically accessible data in warehouse and on the internet is
growing faster, scalability of mining is a major concern and classical mining algorithms require one or
more passes over the entire database can take one hours or even days to execute and in the future the
problem will become worse, to avoid this problem using a sample of data as the synopsis is a populartechnique that can scale very well as the data grow. Mining and analysis algorithms require one or
more computationally intensive passes over the entire database become slow and worse in future. In
Data Mining, Association Rule Mining is a popular and well researched method for discovering relations
between variables in a large database and the information can be used as the basis for decisions about
marketing activities such as market basket analysis, product placements etc.
This project is based on Apriori, SRS (Simple Random Sampling) and FAST (Finding Associations from
Sampled Transactions) algorithm to generate association rules and also for discovering the rules in alarge database. In a large database by applying Apriori, Simple Random Sampling and FAST algorithm
the user can find a best algorithm of calculating the strong and weak rule of the dataset. The user can
calculate the time difference and accuracy in order to find an efficient result of discovering the
association rules.
-
7/29/2019 Sowmya Review
3/24
HARDWARE CONFIGURATION:
Processor : Pentium IV
Processor Speed : 1.7 GHz
Memory (RAM) : 256 MB
Hard Disk : 10 GB
Floppy Drive : 3 1.44 MB DriveMonitor : Samsung Color Monitor
Keyboard : 104 keys Intel Keyboard
Mouse : Intel Optical Mouse
SOFTWARE CONFIGURATION
Operating System : Windows XP
Front End Tool : Microsoft Visual Basic .Net 2008
Back End Tool : Microsoft SQL Server 2000
-
7/29/2019 Sowmya Review
4/24
EXISTING SYSTEM:
The study of existing system has enlightened the limitation of the system and so it has paveda way for the proposed system. The Problem of finding a relationship between variables in a
large database is not as easy as possible.
LIMITATION OF EXISTING SYSTEM:
Limited amount of memory
Need complete list of database
Data may be scattered and poorly accessibleRequires many database scans
Expensive
Lossy compressed synopsis (sketch) of data
Scalability of mining algorithm is a major concern
-
7/29/2019 Sowmya Review
5/24
PROPOSED SYSTEM:
The basis for the proposed system is the recognition of the need for improving the existing system. The proposed system
aims at overcoming the drawbacks of the existing system. An important aspect of the new system is that it should be easy to
incorporate change. The user should be able to make changes without any difficulty at any time. The proposed system of
association rules is done using the Apriori, Simple Random Sampling and FAST,EASE. The proposed system is developed
using Visual Basic.NET as the front end and MS SQL server as the background.
FEATURES OF PROPOSED SYSTEM:
Uses large item set property
Save memory space
Easily implemented
Reduced costs
Reduced field time
Increase accuracy
Provide security
Excellent user friendlinessSimple
Errors can be easily measured
-
7/29/2019 Sowmya Review
6/24
Modules Description :This project is based on FAST, EASE, Apriori and Simple Random Sampling for discovering
association rules in large database.
Apriori Algorithm :
The Apriori algorithm is a classic algorithm for learning association rules and it is mainly used to
designed and operate on database containing the transactions.
Simple Random Sampling :
The Simple Random Sampling is considered separately and it randomly displays the database and
check for the support and confidence in order to find the best rule. Simple Random Sampling can
make sampling a viable means for attaining both high performance and acceptably accurate
results.
-
7/29/2019 Sowmya Review
7/24
FAST Algorithm :
FAST (Finding Associations from Sampled Transactions), a refined sampling-based mining algorithm that is
distinguished from prior algorithms by its novel two phase approach to sample collection. In Phase I a large
sample is collected to quickly and accurately estimate the support of each item in the database. In Phase II, asmall final sample is obtained by excluding outlier transactions in such a manner that the support of each item in
the final sample is as close as possible to the estimated support of the item in the entire database. Indeed, our
numerical experiments indicate that for any fixed computing budget, FAST identify frequent itemsets and fewer
false itemsets than sampling-based algorithms. FAST can identify most frequent itemsets in a database at an
overall cost that is much lower than that of classical algorithms.In this project A New Two Phase Sampling Algorithm for Discovering Association Rules the user can
find out the best comparison time and variation between the algorithms. In a large dataset, first the Apriori
algorithm has been applied to find the support and confidence in order to find the strong rule and weak rule, and
then randomly display the dataset and find the strong rule and weak rule based on the support and confidence of
the dataset.At last the FAST (Finding Associations from Sampled Transactions) algorithm has been used in a large dataset to
find out the strong and weak rule based on the support and confidence of the dataset. By applying the three
algorithms the user can calculate the correct time and accuracy and also the user can find out the best algorithm
from calculating the time difference.
-
7/29/2019 Sowmya Review
8/24
EASE Algorithm :
In this paper we introduce a novel data-reduction method, called ease (Epsilon
Approximation: Sampling Enabled), that is especially designed for categorical count
data. This algorithm is an outgrowth of earlier work by Chen, et al. on the fast data-
reduction method. Both ease and fast start with a relatively large simple random
sample of transactions and deterministically trim the sample to create a final
subsample whose distance" from the complete database is as small as possible. For
reasons of computational efficiency, both algorithms subsample as close" to the
original database if the high-level aggregates of the subsample normalized by the total
number of data points are close" to the normalized aggregates in the database. These
normalized aggregates typically correspond to 1-itemset or 2-itemset supports in the
association-rule setting or, in the setting of a contingency table, relative marginal or
cell frequencies
-
7/29/2019 Sowmya Review
9/24
-
7/29/2019 Sowmya Review
10/24
Apply EASE Algorithm
Highlight with Blue and RedColor
-
7/29/2019 Sowmya Review
11/24
COLUMN NAME DATATYPE DESCRIPTION
DS_ID Numeric Dataset Identification
DS_TRANS Text Dataset Transaction data
TABLE NAME : Dataset_master | Primary Key : DS_ID
-
7/29/2019 Sowmya Review
12/24
COLUMN NAME DATATYPE DESCRIPTION
TRAN_NO Numeric Transaction Number
TYPE Text Transaction Type
SNO Numeric Serial Number
STARTED_TIME Datetime Started Time
ELAPSED_TIME Datetime Elapsed time
RULES Text Rules
TABLE NAME : Result_analysis | Primary Key : Tran_no
-
7/29/2019 Sowmya Review
13/24
-
7/29/2019 Sowmya Review
14/24
-
7/29/2019 Sowmya Review
15/24
-
7/29/2019 Sowmya Review
16/24
APRIORI
-
7/29/2019 Sowmya Review
17/24
FINDING RULES
-
7/29/2019 Sowmya Review
18/24
SIMPLE RANDOM SAMPLE
-
7/29/2019 Sowmya Review
19/24
FINDING RULES
-
7/29/2019 Sowmya Review
20/24
FAST TESTING
-
7/29/2019 Sowmya Review
21/24
FINDING RULES
-
7/29/2019 Sowmya Review
22/24
APPLY EASE ALGORITHM
-
7/29/2019 Sowmya Review
23/24
RESULT ANALYSIS
-
7/29/2019 Sowmya Review
24/24