factor handout

Click here to load reader

Upload: vaibhav-ahuja

Post on 12-Jan-2016

219 views

Category:

Documents


1 download

DESCRIPTION

qm

TRANSCRIPT

Factor Analysis

Factor AnalysisIntroductionFactor analysis is an interdependence technique whose primary purpose is to define the underlying structure among the variables in the analysis.

It examines the interrelationships among a large number of variables and then attempts to explain them in terms of their common underlying dimensions.

These common underlying dimensions are referred to as factors.

A summarization and data reduction technique that does not have independent and dependent variables, but is an interdependence technique in which all variables are considered simultaneously.

Correlation Matrix of Variables After Grouping Using Factor AnalysisShaded areas represent variables likely to be grouped together by factor analysis.Factor Analysis ObjectiveData summarization: derives underlying dimensions that, when interpreted and understood, describe the data in a much smaller number of concepts than the original individual variables.

Data reduction: extends the process of data summarization by deriving an empirical value (factor score or summated scale) for each dimension (factor) and then substituting this value for the original values.

Two most prominent type of Factoring (Extraction) There are two major methods of extracting the factors from a set of variables:

Principal components analysis (PCA): Used in exploratory research, as when a researcher simply wants to reduce a large number of items (ex., an 80-item survey) to a smaller number of underlying latent dimensions (ex. 7 factors). This is the most common type of "factor analysis.

Principal factor analysis (PFA) or Principal Axis Factoring (PAF) or Common Factor Analysis (CFA): Used in confirmatory research, as when the researcher has a causal model. As such it is used in conjunction with causal modeling techniques such as path analysis, partial least squares modeling, and structural equation modeling.Principal Component Analysis (PCA)Principal Axis Factoring (PAF)Analyzes a correlation matrix in which the diagonals contain 1's. Analyzes a correlation matrix in which the diagonals contain the communalities.PCA accounts for total variance of variables. Factors or components reflect the common variance plus unique variance.PFA accounts for the co-variation among variables. Factors reflect the common variance of the variables excluding unique variances. PCA is a variance focused technique.PFA is a correlation focused technique.PCA is used for exploratory purposes when a researcher does not have a causal model but simply wants to reduce the large number of items to a smaller number.Typically used in confirmatory research.

Adding variables to the model will change the factor loadings.It is possible to add variables to the model without affecting the factor loadings.

Assumptions Multi-collinearity- Assessed using MSA (measure of sampling adequacy).

The MSA is measured by the Kaiser-Meyer-Olkin (KMO) statistic. As a measure of sampling adequacy, the KMO predicts if data are likely to factor well based on correlation and partial correlation.

KMO can be used to identify which variables to drop from the factor analysis because they lack multi-collinearity.

There is a KMO statistic for each individual variable, and their sum is the KMO overall statistic.

KMO varies from 0 to 1.0. Overall KMO should be .50 or higher to proceed with factor analysis. If it is not, remove the variable with the lowest individual KMO statistic value one at a time until KMO overall rises above .50, and each individual variable KMO is above .50.

There must be a strong conceptual foundation to support the assumption that a structure does exist before the factor analysis is performed.

Multivariate Normality

Factor loadingsThe factor loadings are the correlation coefficients between the items (rows) and factors (columns).

Analogous to Pearson's r, the squared factor loading is the percent of variance in that indicator variable explained by the factor.

To get the percent of variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor (column) and divide by the number of variables.

Factor loadings should be .5 or higher to confirm that independent variables identified a priori are represented by a particular factor, on the rationale that the .5 level corresponds to about half of the variance in the indicator being explained by the factor.

CommunalityCommunality (h2 ) measures the percent of variance in a given variable explained by all the factors jointly and may be interpreted as the reliability of the indicator. Technique- row-wise for each variable ----cell1^2+cell2^2+cell3^2+..In the example below, focused on subjects' music preferences (see example below), the extracted factors explain over 95% of preferences for rap music but only 56% for country western music. In general, communalities show for which measured variables the factor analysis is working best and least well.

Low communality. When an indicator variable has a low communality, the factor model is not working well for that indicator and possibly it should be removed from the model.

Low communalities across the set of variables indicates the variables are little related to each other.

However, communalities must be interpreted in relation to the interpretability of the factors. A communality of .75 seems high but is meaningless unless the factor on which the variable is loaded is interpretable, though it usually will be. A communality of .25 seems low but may be meaningful if the item is contributing to a well-defined factor.

What is critical is not the communality coefficient per se, but rather the extent to which the item plays a role in the interpretation of the factor, though often this role is greater when communality is high.

Spurious solutions. If the communality exceeds 1.0, there is a spurious solution, which may reflect too small a sample or the researcher has too many or too few factors.

In PCA the initial communality will be 1.0 for all variables and all of the variance in the variables will be explained by all of the factors, which will be as many as there are variables.

The "extracted" communality is the percent of variance in a given variable explained by the factors which are extracted, which will usually be fewer than all the possible factors, resulting in coefficients less than 1.0.EigenvaluesThe eigenvalue for a given factor measures the variance in all the variables which is accounted for by that factor.

The ratio of eigen values is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored as redundant with more important factors.

Technique- column-wise for each factor- cell1^2+cell2^2+cell3^2+In the example below, again on analysis of music preferences, 18 components (factors) would be needed to explain 100% of the variance in the data. However, using the conventional criterion of stopping when the initial eigen value drops below 1.0, only 6 of the 18 factors were actually extracted in this analysis. These six account for 72% of the variance in the data.

Criteria for determining the number of factorsScree plot: The scree test, shown below, plots the components as the X axis and the corresponding eigen values as the Y axis. As one moves to the right, toward later components, the eigen values drop. When the drop ceases and the curve makes an elbow toward less steep decline, scree test says to drop all further components after the one starting the elbow.

Kaiser criterion: The Kaiser rule is to drop all components with eigen values under 1.0. It may overestimate or underestimate the true number of factorsRotation methodsRotation serves to make the output more understandable and is usually necessary to facilitate the interpretation of factors. The sum of eigen values is not affected by rotation, but rotation will alter the eigen values (and percent of variance explained) of particular factors and will change the factor loadings.

No rotation-The original, unrotated principal components solution maximizes the sum of squared factor loadings, efficiently creating a set of factors which explain as much of the variance in the original variables as possible. The amount explained is reflected in the sum of the eigen values of all factors. However, unrotated solutions are hard to interpret because variables tend to load on multiple factors.

Important consideration in selecting the rotation technique depends on whether the researcher wants orthogonality. Under Orthogonality, the unique factors should be uncorrelated with each other.

Varimax rotation is an orthogonal rotation method that minimizes the number of variables that have high loadings on each factor. This method simplifies the interpretation of the factors. This is the most common rotation option.

Assumption is that the factors are uncorrelated.

Direct oblimin rotation, sometimes called just "oblique rotation," is the standard method when one wishes a non-orthogonal (oblique) solution -- that is, one in which the factors are allowed to be correlated. This will result in higher eigenvalues but diminished interpretability of the factors.

Example- Salary, incentives, Esops

Oblique rotations allow the factors to be correlated. In statistical output, a factor correlation matrix is generated when oblique rotation is requested. Normally, however, an orthogonal method such as varimax is selected and no factor correlation matrix is produced as the correlation of any factor with another is zero in orthogonal solutions. In PCA, a component transformation matrix in SPSS output shows the correlation of the factors before and after rotation.

Factor scoresFactor score is a score for a given individual or observation on a given factor. Factor scores can be correlated even when an orthogonal factor extraction was performed.

Regression Scores- The most common type of factor score is the regression scores, based on ordinary least squares (OLS) estimates.

Bartlett scores - Bartlett scores may be preferred over regression scores on the argument that they better conform to the original factor structure. Bartletts score may be correlated.

Anderson-Rubin scores - Anderson-Rubin factor scores are a modification of Bartlett scores to ensure orthogonality. Therefore Anderson-Rubin scores are uncorrelated.

Computing factor scores allows one to look for factor outliers. Also, factor scores may be used as variables in subsequent modeling.

Dropping Variables from the Analysis : A KMO statistic is generated for each predictor. Predictors whose KMO does not rise to some criterion level (example .5 or higher) may be dropped from the analysis. Doing so is dropping predictors based on low partial correlation.

The more prevalent criterion for dropping predictors is low communality, based on factor analysis itself.

As the two dropping criteria (KMO and communality) may differ the latter is generally preferred though both may be considered. Low KMO indicates the variable in question may be too mult- collinear with others in the model. Low communality indicates that the variable is not well explained by the factor model. One strategy is to drop the indicator variables with the lowest individual KMO one at a time until overall KMO rises above .50.

Predicting if data will factor well:

An overall KMO of .5 or higher is considered adequate and .8 or higher is considered good factorability.

The KMO increases as 1) the sample size increases 2) the average correlation increases 3) the number of variables increases or 4) the factor decreases. The researcher should always have an MSA of .5 or above before proceeding with factor analysis.

V3V8V9V2V6V7V4V1V5

V3 Return Policy1.00

V8 In-store Service.7331.00

V9 Store Atmosphere.774.7101.00

V2 Store Personnel.741.719.7871.00

V6 Assortment Depth.423.311.429.4451.00

V7 Assortment Width.471.435.468.490.7241.00

V4 Product Availability.427.428.479.497.713.7191.00

V1 Price Level.302.242.372.427.281.354.4701. 00

V5 Product Quality.307.240.326.406.325.378.472.7651.00