new link based approach for categorical data clustering

Upload: chiranth-bo

Post on 03-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 New link based approach for categorical data clustering

    1/17

    NEW LINK BASED APPROACH

    FOR CATEGORICAL DATACLUSTERING

    By,

    CHIRANTH B O4th Sem M.tech

    KNOWLEDGE AND DATA ENGINEERING

  • 7/28/2019 New link based approach for categorical data clustering

    2/17

    July 22, 2013 2

    Presentation Outline

    Introduction to Clustering

    Abstract

    Existing System

    Proposed System

    Experimental DesignExperimental results

    Conclusion

  • 7/28/2019 New link based approach for categorical data clustering

    3/17

    ClusteringIntroduction

    ClusteringGrouping similar kind of data.

    Data clustering concerns how togroup a set ofobjects based on their

    similarity of attributes.

    Main methods Partitioning: K-Means

    Hierarchical : BIRCH,ROCK,

    Density-based: DBSCAN,

    A good clustering method will produce high quality clusters with

    high intra-class similarity

    low inter-class similarity

    July 22, 2013 3

  • 7/28/2019 New link based approach for categorical data clustering

    4/17

    July 22, 2013 4

    ABSTRACT

    The categorical data clustering methods are generating

    results based on incomplete information.

    This problem degrades the quality of the clustering result.

    This paper presents a new link-based approach for

    categorical data clustering which improves results by

    discovering unknown entries through similarity betweenclusters

  • 7/28/2019 New link based approach for categorical data clustering

    5/17

    Existing Methods

    K-means cannot cluster the categorical data.

    SQUEEZER and CACTUS generates final clustering

    using incomplete information.

    Many data entries are left unknown.

    July 22, 2013 5

  • 7/28/2019 New link based approach for categorical data clustering

    6/17

    Proposed Methods

    Link based approach improves the

    matrix by discovering the unknown

    entries.

    An efficient link based algorithm used tofind similarity between clusters.

    July 22, 2013 6

  • 7/28/2019 New link based approach for categorical data clustering

    7/17

    July 22, 2013 7

    Introduction to NLCD

    Designed for very large data sets:

    Time and memory are limited

    Only one scan of data is necessary

    Does not need the whole data set in advance

    Two key Modules:

    Scans the database to build an Binary Matrix.Building refined matrix using Weighted Triple QualityAlgorithm.

  • 7/28/2019 New link based approach for categorical data clustering

    8/17

    Basic process

    July 22, 2013 8

    Dataset X

    Clustering

    1

    Consensus

    FunctionClustering

    2

    Clustering

    M

  • 7/28/2019 New link based approach for categorical data clustering

    9/17

    July 22, 2013 9

    Binary MatrixPairWise-Similarity Matrix

    Clustering

  • 7/28/2019 New link based approach for categorical data clustering

    10/17

    Weighted Triple Quality

    ALGORITHM - WTQ (G, , )

    G = (V, W), a weighted graph, where , ;

    , a set of adjacent neighbors of ;

    = ;

    , the WTQ measure of and;

    0

    For each c

    Ifc

    +1

    Return

    Following that, the similarity between clusters and can be estimated by

    Sim , =

    ,

    July 22, 2013 10

  • 7/28/2019 New link based approach for categorical data clustering

    11/17

    July 22, 2013 11

    Over Lapping Member

    Wx,y W where Cx ,Cy V

    Cluster Network

    wxy =

    ,

  • 7/28/2019 New link based approach for categorical data clustering

    12/17

    July 22, 2013 12

    Experimental Results

    Input parameters:

    Memory (M): 5% of data set

    Disk space (R): 20% ofM

    Initial threshold (T): 0.0Page size (P): 1024 bytes

  • 7/28/2019 New link based approach for categorical data clustering

    13/17

    July 22, 2013 13

    Experimental Results

    KMEANS clustering

    No Time D # Scan DS Time D # Scan

    1 43.9 2.09 289 1o 33.8 1.97 197

    2 13.2 4.43 51 2o 12.7 4.20 293 32.9 3.66 187 3o 36.0 4.35 241

    No Time D # Scan DS Time D # Scan

    1 11.5 1.87 2 1o 13.6 1.87 2

    2 10.7 1.99 2 2o 12.1 1.99 2

    3 11.4 3.95 2 3o 12.2 3.99 2

    NLCD clustering

  • 7/28/2019 New link based approach for categorical data clustering

    14/17

    July 22, 2013 14

    Conclusions

    A New Link Based Clustering that stores the

    clustering features in Matrix.

    Given a limited amount of main memory, NLCD

    can minimize the time required for I/O.

    The problem of constructing the refined matrix is

    efficiently resolved by similarity among

    categorical clusters

  • 7/28/2019 New link based approach for categorical data clustering

    15/17

    Future Work

    The first prominent future work includes an

    extensive study regarding the behavior of other

    link-based similarity measures within thisproblem context.

    The second prominent future work is the new

    method will be applied to specific domains,

    including tourism and medical data sets.

    July 22, 2013 15

  • 7/28/2019 New link based approach for categorical data clustering

    16/17

    References

    IEEE Journal on Data Mining

    http://ilpubs.stanford.edu:8090/508/1/2001-41.pdf

    IEEE Journal on Knowledge and data engineering

    http://en.wikipedia.org/wiki/Clustering_algorithm

    July 22, 2013 16

  • 7/28/2019 New link based approach for categorical data clustering

    17/17

    Q&A

    Thank you for your patience

    July 22, 2013 17