sowmya review

Upload: anandgsoft3603

Post on 14-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Sowmya Review

    1/24

    A New Two-Phase Sampling Algorithm for

    Discovering Association Rules

  • 7/29/2019 Sowmya Review

    2/24

    Data mining techniques have been widely used in various applications. Data mining extract novel and

    useful knowledge from large repositories of data and has become an effective analysis and decision

    means in corporation. The sharing of data for data mining can bring a lot of advantages for research

    and business collaboration. Data mining is becoming an increasingly important tool to transform the

    data into information. The volume of electronically accessible data in warehouse and on the internet is

    growing faster, scalability of mining is a major concern and classical mining algorithms require one or

    more passes over the entire database can take one hours or even days to execute and in the future the

    problem will become worse, to avoid this problem using a sample of data as the synopsis is a populartechnique that can scale very well as the data grow. Mining and analysis algorithms require one or

    more computationally intensive passes over the entire database become slow and worse in future. In

    Data Mining, Association Rule Mining is a popular and well researched method for discovering relations

    between variables in a large database and the information can be used as the basis for decisions about

    marketing activities such as market basket analysis, product placements etc.

    This project is based on Apriori, SRS (Simple Random Sampling) and FAST (Finding Associations from

    Sampled Transactions) algorithm to generate association rules and also for discovering the rules in alarge database. In a large database by applying Apriori, Simple Random Sampling and FAST algorithm

    the user can find a best algorithm of calculating the strong and weak rule of the dataset. The user can

    calculate the time difference and accuracy in order to find an efficient result of discovering the

    association rules.

  • 7/29/2019 Sowmya Review

    3/24

    HARDWARE CONFIGURATION:

    Processor : Pentium IV

    Processor Speed : 1.7 GHz

    Memory (RAM) : 256 MB

    Hard Disk : 10 GB

    Floppy Drive : 3 1.44 MB DriveMonitor : Samsung Color Monitor

    Keyboard : 104 keys Intel Keyboard

    Mouse : Intel Optical Mouse

    SOFTWARE CONFIGURATION

    Operating System : Windows XP

    Front End Tool : Microsoft Visual Basic .Net 2008

    Back End Tool : Microsoft SQL Server 2000

  • 7/29/2019 Sowmya Review

    4/24

    EXISTING SYSTEM:

    The study of existing system has enlightened the limitation of the system and so it has paveda way for the proposed system. The Problem of finding a relationship between variables in a

    large database is not as easy as possible.

    LIMITATION OF EXISTING SYSTEM:

    Limited amount of memory

    Need complete list of database

    Data may be scattered and poorly accessibleRequires many database scans

    Expensive

    Lossy compressed synopsis (sketch) of data

    Scalability of mining algorithm is a major concern

  • 7/29/2019 Sowmya Review

    5/24

    PROPOSED SYSTEM:

    The basis for the proposed system is the recognition of the need for improving the existing system. The proposed system

    aims at overcoming the drawbacks of the existing system. An important aspect of the new system is that it should be easy to

    incorporate change. The user should be able to make changes without any difficulty at any time. The proposed system of

    association rules is done using the Apriori, Simple Random Sampling and FAST,EASE. The proposed system is developed

    using Visual Basic.NET as the front end and MS SQL server as the background.

    FEATURES OF PROPOSED SYSTEM:

    Uses large item set property

    Save memory space

    Easily implemented

    Reduced costs

    Reduced field time

    Increase accuracy

    Provide security

    Excellent user friendlinessSimple

    Errors can be easily measured

  • 7/29/2019 Sowmya Review

    6/24

    Modules Description :This project is based on FAST, EASE, Apriori and Simple Random Sampling for discovering

    association rules in large database.

    Apriori Algorithm :

    The Apriori algorithm is a classic algorithm for learning association rules and it is mainly used to

    designed and operate on database containing the transactions.

    Simple Random Sampling :

    The Simple Random Sampling is considered separately and it randomly displays the database and

    check for the support and confidence in order to find the best rule. Simple Random Sampling can

    make sampling a viable means for attaining both high performance and acceptably accurate

    results.

  • 7/29/2019 Sowmya Review

    7/24

    FAST Algorithm :

    FAST (Finding Associations from Sampled Transactions), a refined sampling-based mining algorithm that is

    distinguished from prior algorithms by its novel two phase approach to sample collection. In Phase I a large

    sample is collected to quickly and accurately estimate the support of each item in the database. In Phase II, asmall final sample is obtained by excluding outlier transactions in such a manner that the support of each item in

    the final sample is as close as possible to the estimated support of the item in the entire database. Indeed, our

    numerical experiments indicate that for any fixed computing budget, FAST identify frequent itemsets and fewer

    false itemsets than sampling-based algorithms. FAST can identify most frequent itemsets in a database at an

    overall cost that is much lower than that of classical algorithms.In this project A New Two Phase Sampling Algorithm for Discovering Association Rules the user can

    find out the best comparison time and variation between the algorithms. In a large dataset, first the Apriori

    algorithm has been applied to find the support and confidence in order to find the strong rule and weak rule, and

    then randomly display the dataset and find the strong rule and weak rule based on the support and confidence of

    the dataset.At last the FAST (Finding Associations from Sampled Transactions) algorithm has been used in a large dataset to

    find out the strong and weak rule based on the support and confidence of the dataset. By applying the three

    algorithms the user can calculate the correct time and accuracy and also the user can find out the best algorithm

    from calculating the time difference.

  • 7/29/2019 Sowmya Review

    8/24

    EASE Algorithm :

    In this paper we introduce a novel data-reduction method, called ease (Epsilon

    Approximation: Sampling Enabled), that is especially designed for categorical count

    data. This algorithm is an outgrowth of earlier work by Chen, et al. on the fast data-

    reduction method. Both ease and fast start with a relatively large simple random

    sample of transactions and deterministically trim the sample to create a final

    subsample whose distance" from the complete database is as small as possible. For

    reasons of computational efficiency, both algorithms subsample as close" to the

    original database if the high-level aggregates of the subsample normalized by the total

    number of data points are close" to the normalized aggregates in the database. These

    normalized aggregates typically correspond to 1-itemset or 2-itemset supports in the

    association-rule setting or, in the setting of a contingency table, relative marginal or

    cell frequencies

  • 7/29/2019 Sowmya Review

    9/24

  • 7/29/2019 Sowmya Review

    10/24

    Apply EASE Algorithm

    Highlight with Blue and RedColor

  • 7/29/2019 Sowmya Review

    11/24

    COLUMN NAME DATATYPE DESCRIPTION

    DS_ID Numeric Dataset Identification

    DS_TRANS Text Dataset Transaction data

    TABLE NAME : Dataset_master | Primary Key : DS_ID

  • 7/29/2019 Sowmya Review

    12/24

    COLUMN NAME DATATYPE DESCRIPTION

    TRAN_NO Numeric Transaction Number

    TYPE Text Transaction Type

    SNO Numeric Serial Number

    STARTED_TIME Datetime Started Time

    ELAPSED_TIME Datetime Elapsed time

    RULES Text Rules

    TABLE NAME : Result_analysis | Primary Key : Tran_no

  • 7/29/2019 Sowmya Review

    13/24

  • 7/29/2019 Sowmya Review

    14/24

  • 7/29/2019 Sowmya Review

    15/24

  • 7/29/2019 Sowmya Review

    16/24

    APRIORI

  • 7/29/2019 Sowmya Review

    17/24

    FINDING RULES

  • 7/29/2019 Sowmya Review

    18/24

    SIMPLE RANDOM SAMPLE

  • 7/29/2019 Sowmya Review

    19/24

    FINDING RULES

  • 7/29/2019 Sowmya Review

    20/24

    FAST TESTING

  • 7/29/2019 Sowmya Review

    21/24

    FINDING RULES

  • 7/29/2019 Sowmya Review

    22/24

    APPLY EASE ALGORITHM

  • 7/29/2019 Sowmya Review

    23/24

    RESULT ANALYSIS

  • 7/29/2019 Sowmya Review

    24/24