special techniques
DESCRIPTION
Special Techniques. Cluster Analysis. Classification or Categorization Classification is mathematical and objective while interpretation is somewhat subjective Minimize within group variation and maximize between group variation Data exploration Data structure is unknown - PowerPoint PPT PresentationTRANSCRIPT
Classification or Categorization◦ Classification is mathematical and objective while
interpretation is somewhat subjective Minimize within group variation and maximize
between group variation Data exploration
◦ Data structure is unknown
3 Basic Methods of clustering algorithms◦ Hierarchical (n< 200)◦ K Means (n > 200) ◦ 2 Step ( large samples and categorical or continuous
variables)
Clusters are nested◦ Larger clusters at later stages may contain
smaller clusters at earlier stages Evaluate results in a dendrogram with
agglomeration schedule ◦ Use K means with specified n to validate
Several options for distance measure and clustering method◦ Interval or count data◦ Interval- sq euclidean distance or euclidean
distance measure with between groups linkage
Uses Euclidean Distance◦ Desired number of clusters specified in advance
Does not require case vs case proximity matrix Observations are grouped by distance to cluster mean at
each iteration and cluster means shift after each iteration◦ Similar to ANOVA◦ Iterations stop when cluster means are stable or when
defined iteration limit is reached Final decision on number of clusters is subjective
◦ Raw data should be carefully analyzed with new cluster membership and several examples
Very large datasets◦ Categorical or continuous data
Pre-clusters identified and then used in a hierarchical procedure
randomization
Logistic regression is more popular now Classify cases into the values of a
dichotomous dependent Purposes
◦ To classify cases into groups using a discriminant prediction equation. ◦ To test theory by observing whether cases are classified as predicted. ◦ To investigate differences between or among groups. ◦ To determine the most parsimonious way to distinguish among groups. ◦ To determine the percent of variance in the dependent variable explained by the
independents. ◦ To assess the relative importance of the independent variables in classifying the
dependent variable. ◦ To discard variables which are little related to group distinctions.
Differs from other methods by having equally spaced time intervals on the X
Objectives Identify the distribution pattern of the variable
over time. Pattern vs noise (error) Trend vs seasonality
Trend analysis and autocorrelation Forecast predicted future variables