clu string

Upload: yousef

Post on 06-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Clu String

    1/18

    1

    Data Mining:

    Clustering

  • 8/3/2019 Clu String

    2/18

    2

    Clustering

    Unsupervised learning or clustering buildsmodels from data without predefinedclasses.

    The goal is to place records into groupswhere the records in a group are highlysimilar to each other and dissimilar torecords in other groups.

    The k-Means algorithm is a simple yeteffective clustering technique.

  • 8/3/2019 Clu String

    3/18

    3

    Clustering Example

  • 8/3/2019 Clu String

    4/18

    4

    K-means example, step 1

    k 1

    k 2

    k 3

    X

    Y

    Pick 3initialcluster

    centers(randomly)

  • 8/3/2019 Clu String

    5/18

    5

    K-means example, step 2

    k 1

    k 2

    k 3

    X

    Y

    Assigneach point

    to the closestclustercenter

  • 8/3/2019 Clu String

    6/18

    6

    K-means example, step 3

    X

    Y

    Moveeach clustercenter

    to the meanof each cluster

    k 1

    k 2

    k2

    k 1

    k 3

    k 3

  • 8/3/2019 Clu String

    7/18

    7

    K-means example, step 4

    X

    Y

    Reassignpointsclosest to adifferent newcluster center

    Q: Which points are reassigned?

    k 1

    k 2

    k 3

  • 8/3/2019 Clu String

    8/18

    8

    K-means example, step 4

    X

    Y

    A: three points with animation

    k 1

    k 3k

    2

  • 8/3/2019 Clu String

    9/18

    9

    K-means example, step 4b

    X

    Y

    re-computeclustermeans

    k 1

    k 3k

    2

  • 8/3/2019 Clu String

    10/18

    10

    K-means example, step 5

    X

    Y

    move cluster

    centers tocluster means

    k 2

    k 1

    k 3

  • 8/3/2019 Clu String

    11/18

    11

    Finding Distance bet. TwoPoints

    Point X Y

    a 2 7

    b 4 5

    c 6 3

  • 8/3/2019 Clu String

    12/18

    12

    Example Data for ClusteringRID Age Years of

    service1 30 52 50 253 50 154 25 55 30 106 55 25

    Distance (r j , r k)=

  • 8/3/2019 Clu String

    13/18

    13

    Steps of the k-means algorithm: K =2, meaning that Number of clusters is 2

    clusters C1 & C2 . Let Rid=3 and Rid=6 be the centers for clusters

    C1 & C2 , respectively. Distance(r1 ,r3) =

    = = = 22.7 Distance(r1,r6) = =32.0 .

    So r1 is placed in cluster C1 since it is closerto the center of C1

  • 8/3/2019 Clu String

    14/18

    14

    Similarly: Distance (r2,r3) = 10 , Distance (r2,r6) = 5

    Therefore r2 is added to C2 Distance (r4,r3) = 25.5 , Distance (r4,r6) = 36.6

    Therefore r4 is added to C1

    Distance (r5 ,r3) = 20.6 , Distance (r5,r6) = 29Therefore r5 is added to C1

    Finally:

    r1 r3 r4 r5

    r2 r6

    C1 C2

  • 8/3/2019 Clu String

    15/18

    15

    Find the new means (centers) for the two

    clusters.

    C i =

    C1 =

    C1 =

    Similarly C 2 = ( 52.5 , 25 )

  • 8/3/2019 Clu String

    16/18

    16

    2nd iteration of the k-meansalgorithm

    - Find Distance (r1 , Cen 1 ) ,Distance (r2 , Cen 1 ) , ..

    - Find Distance (r1 , Cen 2 ) ,Distance(r2 , Cen 2 ) , ..

    Example : Distance ( r1 , Cen 1 ) =

    We obtain:

  • 8/3/2019 Clu String

    17/18

    17

    3 rd Iteration of the k-meansalgorithm

    Find the new centers. Find the Distances. You will find that the clusters did not change.

    C1 still has 1,4,5 and C2 still has 6, 2, 3.

    So this iteration is the End of the k-meansclustering algorithm.

  • 8/3/2019 Clu String

    18/18

    18

    Conclusions

    Clustering is unsupervisedlearning

    K-means algorithm

    Mainly for numeric data.