cluster analysis

10
  Business research Cluster analysis

Upload: popat-vishal

Post on 07-Oct-2015

3 views

Category:

Documents


0 download

DESCRIPTION

mn

TRANSCRIPT

  • Business research Cluster analysis

  • CLUSTER ANALYSISIntroductionCluster analysis is the name given to a bewildering assortment of techniques designed to perform classification by assigning observation to groups so that each group is more or less homogeneous and distinct from other. Given the multivariate nature of data, the researcher s posed with the problem of identifying natural grouping of the objects. Cluster analysis deals with the process of assigning object to groups so that similarity within and difference among groups is restored. Cluster analysis is a pre-classificatory method, where groups of objects have been formed on the basis of profile resemblance in the data matrix itself. Many of these procedures are relatively simple but are usually not supported by an extensive body of statistical reasoning. Different procedures are available and will generate different solutions of the same data set.

  • Meaning and definition Cluster analysis is a class of statistical techniques that can be applied to data that EXHIBIT groupings.Cluster analysis classifies a set of observations into two or more mutually exclusive groups based on combination of interval variables.

  • Methods of cluster analysisonce having decided the measure of similarity coefficient, the researcher may draw upon a variety of clustering programmes, which can be grouped under the following three categories;Dimensionalising methods,Nonhierarchial methods,

    (a) Sequential threshold (b) parallel threshold(c) partitioning method(3) Hierarchial methods(a) single linkage or minimum distance (b) complete linkage (c) average linkage(d) centroid method(e) median method (f) wards method

    *

  • (1) Dimensionalising methodsThese approaches use principal-components or other factor analysis methods to find a dimensional representation of points from inter-object association measures. Cluster are then developed based on grouping their company scores.

  • (2) Nonhierarchial methodsThese methods, based on the proximity matrix methods, use three categories;

    A sequential threshold to develop clusters one by one successively determining cluster centers,

    (b) parallel threshold to develop several clusters simultaneously and

    (c) partitioning method where the clusters are formed on the basis of optimizing some overall criterion measure for a given number of clusters.

  • (3) Hierarchial methodsIn this procedures, a hierarchy or tree-like structure is constructed starting from each point as a cluster. At the next level the two closest points are placed in a cluster. At the following level, a third point joins the first two, or else a second two-point clusters formed, based on various criterion function for assignment. Eventually, all points are grouped into one large cluster.(a) single linkage or minimum distanceThis rule finds two points with the shortest Euclidean distance. These are placed in the first cluster. then the third point with the shortest distance to the members of the cluster (smaller then the two closest un-clustered points)joins this cluster. Otherwise two closest un-clustered points are placed in a cluster.(b) complete linkage this also starts in a similar way as the single linkage . But the criterion for joining points to clusters or clusters-to-clusters is maximum distance rather then minimum.

  • (c) average linkage this rule is similar to the previous rules; however, the distance between two clusters is the average distance from points in the first cluster to the points in the second cluster.(d) centroid method the two clusters are joined for which the distance between the two centroid (points with mean values on each clustering variable) is smallest.(e) median method This is same as centroid method, except that when two clusters are joined, the centroid of new cluster is computed giving equal weight to the two component clusters.(f) wards method The two clusters are joined yield the smallest increase in the overall sum of squared within-cluster distances.

  • Performed/steps of cluster analysisThe largest off-diagonal element in the correlation matrix(the highest correlation between two variables) gives two variables to from the nucleus of the cluster. Each of the remaining variables is to the cluster in turn and the b coefficient for the cluster with that variables included is calculated.The variable whose inclusion yields the highest b coefficient for the new cluster(of three variables) is added to the cluster.Steps 2 and 3 are repeated for a fourth variables, adding to the cluster the variables that yields the highest b coefficient.Continue adding variables by the above procedure until there is a sharp drop in the b coefficient or until the b coefficient falls below some predetermined value. What constitute a sharp drop or a minimum acceptable value of b is a matter for the individual investigator to decide. If a loose clustering is satisfactory, a low criterion value for b may be used.

  • 6when the decision is reached that the first cluster is complete, a new cluster may be started by searching among the variables that have not been clustered for the most highly correlated pair and proceeding as above, being careful not to include already clustered variables in the new cluster. 7variables are added to the second cluster until the b coefficient for that cluster becomes too low. 8additional clusters may be formed from among the remaining variables until all variables have been placed in one or another cluster or until there is no pair of variables remaining that yields a satisfactory b coefficient, at which point clustering is complete.

    *