special techniques

Classification or Categorization◦ Classification is mathematical and objective while

interpretation is somewhat subjective Minimize within group variation and maximize

between group variation Data exploration

◦ Data structure is unknown

3 Basic Methods of clustering algorithms◦ Hierarchical (n< 200)◦ K Means (n > 200) ◦ 2 Step ( large samples and categorical or continuous

variables)

Clusters are nested◦ Larger clusters at later stages may contain

smaller clusters at earlier stages Evaluate results in a dendrogram with

agglomeration schedule ◦ Use K means with specified n to validate

Several options for distance measure and clustering method◦ Interval or count data◦ Interval- sq euclidean distance or euclidean

distance measure with between groups linkage

Uses Euclidean Distance◦ Desired number of clusters specified in advance

Does not require case vs case proximity matrix Observations are grouped by distance to cluster mean at

each iteration and cluster means shift after each iteration◦ Similar to ANOVA◦ Iterations stop when cluster means are stable or when

defined iteration limit is reached Final decision on number of clusters is subjective

◦ Raw data should be carefully analyzed with new cluster membership and several examples

Very large datasets◦ Categorical or continuous data

Pre-clusters identified and then used in a hierarchical procedure

randomization

Logistic regression is more popular now Classify cases into the values of a

dichotomous dependent Purposes

◦ To classify cases into groups using a discriminant prediction equation. ◦ To test theory by observing whether cases are classified as predicted. ◦ To investigate differences between or among groups. ◦ To determine the most parsimonious way to distinguish among groups. ◦ To determine the percent of variance in the dependent variable explained by the

independents. ◦ To assess the relative importance of the independent variables in classifying the

dependent variable. ◦ To discard variables which are little related to group distinctions.

Differs from other methods by having equally spaced time intervals on the X

Objectives Identify the distribution pattern of the variable

over time. Pattern vs noise (error) Trend vs seasonality

Trend analysis and autocorrelation Forecast predicted future variables

special techniques

Documents