least squares method for factor analysis

Upload: prasanna-kumar

Post on 02-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Least Squares Method for Factor Analysis

    1/72

    University of California

    Los Angeles

    Least Squares Method for Factor Analysis

    A thesis submitted in partial satisfaction

    of the requirements for the degree

    Master of Science in Statistics

    by

    Jia Chen

    2010

  • 8/10/2019 Least Squares Method for Factor Analysis

    2/72

    c Copyright byJia Chen

    2010

  • 8/10/2019 Least Squares Method for Factor Analysis

    3/72

    The thesis of Jia Chen is approved.

    Hongquan Xu

    Yingnian Wu

    Jan de Leeuw, Committee Chair

    University of California, Los Angeles

    2010

    ii

  • 8/10/2019 Least Squares Method for Factor Analysis

    4/72

    To my parents

    for their permanent love.

    And to my friends and teachers

    who have given me precious memory and tremendous encouragement.

    iii

  • 8/10/2019 Least Squares Method for Factor Analysis

    5/72

    Table of Contents

    List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

    1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2 Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1 Factor Analysis Models . . . . . . . . . . . . . . . . . . . . . . . . 4

    2.1.1 Random Factor Model . . . . . . . . . . . . . . . . . . . . 5

    2.1.2 Fixed Factor Model . . . . . . . . . . . . . . . . . . . . . . 6

    2.2 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2.1 Principal Component Method . . . . . . . . . . . . . . . . 7

    2.2.2 Maximum Likelihood Method . . . . . . . . . . . . . . . . 9

    2.2.3 Least Squares Method . . . . . . . . . . . . . . . . . . . . 11

    2.3 Determining the Number of Factors . . . . . . . . . . . . . . . . . 12

    2.3.1 Mathematical Approaches . . . . . . . . . . . . . . . . . . 12

    2.3.2 Statistical Approach . . . . . . . . . . . . . . . . . . . . . 13

    2.3.3 The Third Approach . . . . . . . . . . . . . . . . . . . . . 14

    3 Algorithms of Least Squares Methods . . . . . . . . . . . . . . . 15

    3.1 Least Squares on the Covariance Matrix . . . . . . . . . . . . . . 16

    3.1.1 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.1.2 Pro jections . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    iv

  • 8/10/2019 Least Squares Method for Factor Analysis

    6/72

    3.2 Least Squares on the Data Matrix . . . . . . . . . . . . . . . . . . 19

    3.2.1 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.2.2 Pro jection . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    4.1 9 Mental Tests from Holzinger-Swineford . . . . . . . . . . . . . . 25

    4.2 9 Mental Tests from Thurstone . . . . . . . . . . . . . . . . . . . 28

    4.3 17 mental Tests from Thurstone/Bechtoldt . . . . . . . . . . . . . 30

    4.4 16 Health Satisfaction items from Reise . . . . . . . . . . . . . . . 32

    4.5 9 Emotional variable from Burt . . . . . . . . . . . . . . . . . . . 35

    5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . 37

    A Augmented Procrustus . . . . . . . . . . . . . . . . . . . . . . . . . 39

    B Implementation Code . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    C Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    v

  • 8/10/2019 Least Squares Method for Factor Analysis

    7/72

    List of Tables

    4.1 LSFA methods Summary - 9 Mental Tests from Holzinger-Swineford 27

    4.2 Loss function Summary - 9 Mental Tests from Holzinger-Swineford 27

    4.3 Loading Matrices Summary - 9 Mental Tests from Holzinger-Swineford 28

    4.4 LSFA methods Summary - 9 Mental Tests from Thurstone . . . . 29

    4.5 Loss function Summary - 9 Mental Tests from Thurstone . . . . . 30

    4.6 Loading Matrices Summary - 9 Mental Tests from Thurstone . . . 30

    4.7 LSFA methods Summary - 17 mental Tests from Thurstone/Bech-toldt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    4.8 Loss function Summary - 17 mental Tests from Thurstone/Bechtoldt 32

    4.9 Loading Matrices Summary - 17 mental Tests from Thurstone . . 32

    4.10 LSFA methods Summary - 16 Health Satisfaction items from Reise 34

    4.11 Loss function Summary - 16 Health Satisfaction items from Reise 34

    4.12 Loading Matrices Summary - 16 Health Satisfaction items from

    Reise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.13 LSFA methods Summary - 9 Emotional variable from Burt . . . . 36

    4.14 Loading Matrices Summary - 9 Emotional variable from Burt . . 36

    vi

  • 8/10/2019 Least Squares Method for Factor Analysis

    8/72

    Abstract of the Thesis

    Least Squares Method for Factor Analysis

    by

    Jia Chen

    Master of Science in Statistics

    University of California, Los Angeles, 2010

    Professor Jan de Leeuw, Chair

    This paper demonstrates the implementation of using alternating least squares

    to solve the common factor analysis. The algorithm leads to convergence and

    accumulation points of the sequences it generates will be stationary points. In

    addition to implementing the Procrustus algorithm, it provides a means of veri-

    fying that the solution obtained is at least a local minimum of the loss function.

    vii

  • 8/10/2019 Least Squares Method for Factor Analysis

    9/72

    CHAPTER 1

    Introduction

    A major objective of scientific or social activities is to summarize, by theoretical

    formulations, the empirical relationships among a given set of events and discover

    the natural laws behind thousands of random events. The events can be investi-

    gated are almost infinite, so it is difficult to make any general statement about

    phenomena. However, it could be stated that scientists analyze the relationships

    among a set of variables, while these relationships are evaluated across a set of

    individuals under specified conditions. The variables are the characteristic being

    measured and could be anything that can be objectively identified or scored.

    Factor analysis can be used for theory instrument development and assessing

    construct validity of an established instrument when administered to a specific

    population. Through factor analysis, the original set of variables is reduced to

    a few factors with minimum loss of information. Each factor represents an area

    of generalization that is qualitatively distinct from that represented by any other

    factors. Within an area where data can be summarized, factor analysis first rep-

    resents that area by a factor and then seeks to make the degree of generalization

    between each variable and the factor explicit [6].

    There are many methods available to estimate a factor model, and the purposeof this paper is to present and implement a new least squares algorithm, and

    then compare its speed of convergence and model accuracy to some existing

    approaches. To begin, we provide some matrix background and assumptions on

    1

  • 8/10/2019 Least Squares Method for Factor Analysis

    10/72

    the existence of a factor analysis model.

    2

  • 8/10/2019 Least Squares Method for Factor Analysis

    11/72

    CHAPTER 2

    Factor Analysis

    Many statistical methods are used to study the relation between independent and

    dependent variables. Factor analysis is different; the purpose of factor analysis is

    data reduction and summarization with the goal understanding causation. It aims

    to describe the covariance relationships among a large set of observed variables in

    terms of a few underlying, but unobservable, random quantities called factors.

    Factor analysis is a branch of multivariate analysis that was invented by psy-

    chologist Charles Spearman. He discovered that school childrens scores on a

    wide variety of seemingly unrelated subjects were positively correlated, which led

    him to postulate that a general mental ability, or g, underlies and shapes human

    cognitive performance. Raymond Cattell expanded on Spearmans idea of a two-

    factor theory of intelligence after performing his own tests and factor analysis.

    He used a multi-factor theory to explain intelligence. Factor analysis was devel-

    oped to analyze test scores so as determine if intelligence is made up of a single

    underlying general factor or of several more limited factors measuring attributes

    like mathematical ability. Today factor analysis is the most widely used brach

    of multivariate analysis in the psychological field, and helped by the advent of

    electronic computers, it has been quickly spreading to economics, botany, biology

    and social sciences.

    Factor analysis has two main motivations to study it. One of the purposes of

    factor analysis is to reduced the number of variables. In multivariate analysis, one

    3

  • 8/10/2019 Least Squares Method for Factor Analysis

    12/72

    often has data or a large number of variables Y1, Y2, ... Ym, and it is reasonable

    to believe that there is a reduce list of unobserved factors that determine the

    full dataset. And the primary motivation for factor analysis is to detect patterns

    of relationship among many dependent variables with the goal of discovering the

    independent variables that affect them, event though those independent variables

    cannot be measured directly. The object of a factor problem is to account for

    the tests, the smallest possible number that is consistent with acceptable residual

    errors [21].

    2.1 Factor Analysis Models

    The factor analysis is generally presented with the framework of the multivariate

    linear model for data analysis. Two classical linear common factor models are

    briefly reviewed. For a more comprehensive discussion should refer to Anderson

    and Rubin [1956] and to Anderson [1984] [1] [2].

    Common factor analysis (CFA) starts with the assumption that the variance

    in a given variable can be explained by a small number of underlying common

    factors. For the common factor model, the factor score matrix can be divided

    into two parts. The common factorpart and unique factor part. The model

    in matrix algebra form as:

    Ynm

    = Fnm

    + Unm

    ,

    Fnm = Hnp Apm,

    Unm

    = Enm

    Dmm

    .

    4

  • 8/10/2019 Least Squares Method for Factor Analysis

    13/72

    It can be rewritten as:

    Y= HA + ED (2.1)

    where the common factor part is linear combination ofp common factors (Hnp)

    and factor loadings (Amp). The unique factor part is linear combination of m

    unique factors (Enm) and unique factor scores (Dmm) is a diagonal matrix.

    In the common factor model, common factor and unique factor are assumed

    to beorthogonaland follow a multivariate normal distribution with mean zero

    and scaled to have unit length. The assumption of normality means they are

    statistically independent random variables. The common factor are assumed to

    be independent of the unique factor. Therefore, E(H) = 0, H

    H = Ip, E(E) =0, EE = Im, E

    H= 0mp, and D is a diagonal matrix.

    The common factor model (2.1) and assumptions imply the following model

    correlation structure for the observed variables:

    = AA + D2 (2.2)

    2.1.1 Random Factor Model

    The matrixY is assumed to be a realization of a matrix-valued random variable

    Y, where the random variable Y has a random common part F and a random

    unique part U. Thus

    Ynm

    = Fnm

    + Unm

    ,

    Fnm

    = Hnp

    Apm

    ,

    Unm

    = Enm

    Dmm

    .

    Each row ofYare corresponding with an individual observation, and these obser-

    vations are assumed to be independent. Moreover the specific parts are assumed

    5

  • 8/10/2019 Least Squares Method for Factor Analysis

    14/72

    to be uncorrelated with the common factors, and with the other specific parts.

    2.1.2 Fixed Factor Model

    The random factor model explained above was criticized soon after it was formally

    introduced by Lawley. The point is that in factor analysis different individuals

    are regarded as drawing their scores from differentk-way distributions, and in

    these distributions the mean for each test is the true score of the individual on

    that test. Nothing is implied about the distribution of observed scores over

    a population of individuals, and one makes assumptions only about the error

    distributions [25].

    There is a fixed factor model, which assumes

    Y= F + U

    The common part is a bilinear combination of a number of a number of

    common factor loadings and common factor scores

    F= HA

    In the fixed model we merely assume the specific parts are uncorrelated withe

    the other specific parts.

    2.2 Estimation Methods

    In common factor analysis, the population covariance matrix of m variables

    with ncommon factors can be decomposed as

    = AA + D2,

    6

  • 8/10/2019 Least Squares Method for Factor Analysis

    15/72

    whereAis the loading matrix of order m nandD2 is the matrix of unique vari-ance of order mwhich is diagonal and non-negative definite. These parameters

    are nearly always unknown and need to be estimated from the sample data. The

    estimation are relatively straightforward method of breaking down a covariance

    or correlation matrix into a set of orthogonal components or axes equal in number

    to the number of variate methods. The sample covariance matrix is occasionally

    used, but it is much more common to work with the sample correlation matrix.

    There are many different methods have been developed for estimating, the

    best known of these is principal factor method. It extracts the maximum amount

    of variance that can be possibly extracted by a given number of factors. This

    method chooses the first factor so as to account for as much as possible of the

    variance from the correlation matrix, the second factor to account for as much

    as possible of the remaining variance, and so on.

    In 1940, a major step forward was made by D. N. Lawley, who developed

    the Maximum Likelihood equations. These are fairly complicated and difficult

    to solve, but recent computational advances, particularly by K. G. J..oreskog,

    have made maximum-likelihood estimation a practical proposition, and computerprograms has widely available [4]. Since then, the maximum likelihood become

    a dominant estimation method in factor analysis.

    2.2.1 Principal Component Method

    Let variance-covariance matrix have corresponding eigenvalues and eigenvector.

    Eigenvalues of

    1,2, ...,m

    7

  • 8/10/2019 Least Squares Method for Factor Analysis

    16/72

    where

    (12 ... m)Eigenvector of

    e1,e2, ...,emThe Spectral Decomposition of [13] says, the variance-covariance matrix

    can be expressed as the sum of m eigenvalues multiplied by their eigenvectors

    and their transpose. The idea behind the principal component method is to

    approximate this expression.

    =mi=1

    ieiei=

    1e1

    2e2 ...

    mem

    A

    1e12e

    2

    .

    .

    .

    mem

    A

    =AA (2.3)

    Instead of summing the equation (2.3) from 1 to m, we would sum it from 1

    to p to estimate the variance-covariance matrix.

    =

    =

    pi=1

    ieiei=

    1e1

    2e2 ...

    pep

    A

    1e

    1

    2e2

    .

    .

    .pe

    p

    A

    =

    A

    A

    (2.4)

    8

  • 8/10/2019 Least Squares Method for Factor Analysis

    17/72

    The equation (2.4) yields the estimator for the factor loadings:

    A= 1e1 2e2 ... pep (2.5)Recall equation (2.2), D

    2

    is going to be equal to the variance-covariance matrixminusAA.

    D2 = AA (2.6)2.2.2 Maximum Likelihood Method

    Maximum-likelihood method was first proposed to factor analysis by Lawley

    (1940, 1941, 1943) but its routine use had to await the development of com-

    puters and suitable numerical optimization procedures [19]. This method is the

    procedure of finding the value of one or more parameters for a given statistic

    which makes the known likelihood distribution a maximum. It consist in find-

    ing factor loadings which maximize the likelihood function for a specified set of

    unique variances. The Maximum-Likelihood method are assumed that the data

    are independently sampled from a multivariate normal distribution. As the com-

    mon factor (H) and unique factor (E) are assumed to be multivariate normal, Y

    = HA + ED are then multivariate normal with mean vector 0 and variance-

    covariance matrix . Maximum likelihood method is estimating the matrix of

    factor loadings and unique factors. The method estimator for factor loadings A

    and the unique factors D are obtained by findingA andD that maximizes thelog-likelihood, which is given by the following expression:

    (A, D) = nm2

    log2 n2

    log AA + D2 12

    Y(AA + D2)1Y (2.7)

    There are two types of maximum likelihood methods, one is called Covari-

    ance Matrix Methods. This method was first proposed by Lawley [14], and

    9

  • 8/10/2019 Least Squares Method for Factor Analysis

    18/72

    then popularized and programmed by Joreskog [12]. Since then the multinormal

    maximum likelihood for the random factor model became the dominant estima-

    tion method in factor analysis. The maximum likelihood was applied to the like-

    lihood function of the covariance matrix, assuming multivariate normality. The

    negative log-likelihood measures the distance between the sample and population

    covariance model, and choose AandD to minimize

    (A, D) =n log || +n tr1 S, (2.8)

    where Sis the sample covariance matrix of Y, and = AA + D2.

    In Anderson and Rubin the impressive machinery developed by the Cowles

    Commission was applied to both the fixed and random factor analysis model.

    Maximum likelihood was applied to likelihood function of the covariance matrix,

    assuming multivariate normality.

    The other method is called Data Matrix Methods. The maximum like-

    lihood procedures were proposed by Lawley [14] were criticized soon after they

    appeared by Young. Young said Such a distribution is specified by the means

    and variance of each test and the covariance of the tests in pairs it has no

    parameters distinguishing different individuals. Such a formulation is therefore

    inappropriate for factor analysis, where factor loadings of the tests and of the

    individuals enter in a symmetric fashion in a bilinear form. [25]

    Young proposed to minimize the log-likelihood of the data,

    (H, A, D) =n log |D|+ tr(Y HA)D1(YHA) (2.9)

    whereD isknowndiagonal matrices with column (variable) weights. The solution

    is given by a weighted singular value decomposition ofY.

    The basic problem with Youngs method is that it assumes the weights to

    be known. One solution, suggested by Lawley, is to estimate them along with

    10

  • 8/10/2019 Least Squares Method for Factor Analysis

    19/72

    the loadings and uniquenesses [15]. If there are no person-weights, Lawley sug-

    gests to alternate minimization over (H,A), which is done by weighted singular

    value decomposition, and minimization over diagonal D, which simply amounts

    to computing the average sum of squares of the residuals for each variable. How-

    ever, iterating two minimizations produces a block relaxation algorithm intended

    to minimize the negative log-likelihood does not work. Although the algorithm

    produces a decreasing sequence of loss function values. A rather disconcerting

    feature of the new method is, however, that iterative numerical solutions of the

    estimation equations either fail to converge, or else converge to unacceptable so-

    lutions in which one of more of the measurements have zero error variance. It is

    apparently impossible to estimate scale as well as location parameters when so

    many unknowns are involved [24].

    In fact, if we look at the loss function we can see it is unbounded below. We

    can choose scores to fit one variable perfectly, and then let the corresponding

    variance term approach zero [1].

    In 1952, Whittle suggested to take D proportional to the variance of the vari-

    ables. This amounts to doing a singular value decomposition of the standardizedvariables. Joreskog makes the more reasonable choice of setting D proportional

    to the reciprocals of the diagonals of the inverse of the covariance matrix of the

    variables [11].

    2.2.3 Least Squares Method

    The least-squares method is one of the most important estimation methods, which

    attempts to obtain such values of the factor loading A and the unique variance

    D2 that minimizes a different loss function. The least squares loss function either

    used in minimizing the residual of covariances matrix or residual of data matrix.

    11

  • 8/10/2019 Least Squares Method for Factor Analysis

    20/72

    The least squares loss function used on the covariances matrix is

    (A, D) =1

    2SSQ(CAA D2)

    We minimize over A R

    mp

    and D D

    m

    , the diagonal matrices of order m

    .There have been four major approaches to minimizing this loss function.

    The least squares loss function used on the data matrixis

    (H, A, E, D) =1

    2SSQ(Y HA ED)

    We minimize over H Rnp, E Rnm, A Rmp and D Dm, under theconditions that HH= I, EE = I,HE = 0 and D is diagonal.

    2.3 Determining the Number of Factors

    A factor model is not solveable without first determining the number of factors

    p. How many common factors should be included in the model? This requires a

    determination of how parameters are going to be involved. There are statistical

    and mathematical approaches to determine the number of factors.

    2.3.1 Mathematical Approaches

    The mathematical approach to the number of factors is concerned with the num-

    ber of factors for a particular sample of variables in the population, and the

    theories of approaches are based on a population correlation matrix. Mathemati-

    cally, the number of factors underlying any given correlation matrix is a function

    of its rank. Estimating the minimum rank of the correlation matrix is

    the same as estimating the number of factors [6].

    1. The percentage of variance criterion. This method applies particu-

    larly to the principal component method. The percentage of common

    12

  • 8/10/2019 Least Squares Method for Factor Analysis

    21/72

    variance extracted is computed by using the sum of the eigenvalues of

    variance-covariance matrix () in the division. Usually, investigators com-

    pute the cumulative percentage of variance after each factor is removed

    from the matrix and then stop the factoring process when 75, 80 or 85% of

    the total variance is account for.

    2. The latent root criterion. This rule is the commonly used criterion of

    long standing and performs well in practice. This method use the variance-

    covariance matrix and choose the number of eigenvalues greater than one.

    3. The scree test criterion. The scree test was named after the geological

    term scree. It also results in practice well. This rule is derived by plotting

    the latent roots against the number of factors in their order of the extrac-

    tion, and the shape of the resulting curve is used to evaluate the cutoff

    point. Use the scree test based on a plot of the eigenvalues of. If the

    graph drops sharply, followed by a straight line with much smaller slope,

    choose m equal to the number of eigenvalues before the straight line begins.

    Because the test involves subjective judgement, it cannot be programmed

    into the computer run.

    2.3.2 Statistical Approach

    In the statistical procedure for determining the number of factors to extract,

    the following question is asked: Is the residual matrix afterp factors have been

    extracted statistically significant? A hypothesis tesst would be state to answer

    this question, H0 : = AA

    + D2

    vs H1 : = AA

    +D2

    where H is mpmatrix. If the statistic is significant at beyond the 0.05 level, then the number

    of factors is insufficient to totally explain the reliable variance. If the statistic is

    13

  • 8/10/2019 Least Squares Method for Factor Analysis

    22/72

    nonsignificant, then the hypothesized number of factors is correct.

    Bartlett has presented a chi-square test of the significance of a correlation

    matrix, the test statistic is

    2 = (n 1 2+ 56

    ) ln || (2.10)

    where = 12 [(m p)2 m p] and|| is the determinant of the correlationmatrix [3].

    2.3.3 The Third Approach

    There is no single way to determine the number of factors to extract. A third

    approach to sometimes use is to look the theory within field of study for indica-

    tions of how many factors to expect. In many respects this is a better approach

    because its letting the science to drive the statistic rather than the statistic to

    drive the science.

    14

  • 8/10/2019 Least Squares Method for Factor Analysis

    23/72

    CHAPTER 3

    Algorithms of Least Squares Methods

    Given a multivariate sample ofn independent observations on each of taking m

    variables. Collect these data in an nm matrix Y. In common factor analysismodel can be written as:

    Y= HA + ED (3.1)

    where, H Rnp, A Rmp, E Rnm, D Rmm, HH = I, EE = I, andHE= 0 , and where D is diagonal, p is the number of common factors.

    Minimizing the least squares loss function is a form of factor analysis, but it

    is not the familiar one. In classical least squres factor analysis, as described in

    Young [1941], Whittle [1952] and Joreskog [1962], the unique factors E are not

    parameter in the loss function [25] [24] [11]. Instead the unique variances are used

    to weight the residuals of each observed variable. Two different loss function will

    be illustrated.

    15

  • 8/10/2019 Least Squares Method for Factor Analysis

    24/72

    3.1 Least Squares on the Covariance Matrix

    3.1.1 Loss Function

    The least squares loss function used in LSFAC is

    (A, D) =1

    2SSQ(CAA D2) (3.2)

    We minimize over A Rmp and D Dm, the diagonal matrices of order m.

    3.1.2 Projections

    We also define the two projected or concentrated loss functions, in which one set

    of parameters is minimized out,

    (A) = minDDm

    (A, D) =

    1j

  • 8/10/2019 Least Squares Method for Factor Analysis

    25/72

    over Afor Dfixed at its current value and minimizing over Dfor A fixed

    at its current value.

    The basic procedure starts with the choice of the number of factors p and

    the selection of an arbitrary set of unique variance D2 estimates for the m

    variables. If C - D2 = KK is the eigen-decomposition of C - D2, and

    we write pandKp for the p largest eigenvectors and corresponding eigen-

    vectors, then the minimum over Afor fixedD isA = K1

    2 . If fewer than

    p eigenvalues are positive, then the negative elements in p are replaced

    by zeroes. The minimum overD for fixed A is attained at D2 = diag(C

    - AA). Because D2 = diag(C - AA) we always have D2 diag(C),but there is no guarantee that convergence is to a Dfor which bothDfor

    which both D2 0 and C D2 0. The algorithm for this method maybe expressed in the following form.

    Step 1. Start with the observed covariance matrix with arbitrary diagonal:

    C D2.Step 2. Compute: C

    D

    2= KK, where is the diagonal matrix of the

    eigenvalues and the columns of Kare the associated eigenvectors.

    Step 3. Determine the first p principal factors:A = Kp12p where p is thep p submatrix of containing the p largest eigenvalues, andKp isthe corresponding n p submatrix ofK.

    Step 4. Determine the reproduced unique variance:D2 =C diag(AA).Step 5. Repeat Steps 2-4 until the convergence criterion is met.

    2. Comreys Minimum Residual Factor Analysis. Comrey proposed minimum

    residual factor analysis, which takes into account only the off-diagonal ele-

    17

  • 8/10/2019 Least Squares Method for Factor Analysis

    26/72

    ments of the variance-covariance or correlation matrix [5]. The method was

    put on a more solid footing after modified by Zegers and Ten Berge [26].

    The object of MRFA is to determine a matrix A of order m p, which,given some covariance matrix Cof order m m, minimizes the function

    (A) = tr [(C0 AA + diag(AA)(C0 AA + diag(AA)] , (3.5)

    where C0= C diag(C).Zegers and Berge [1983] show equation (3.5) is minimized by taking,

    aik=

    a2jk

    1c(k)ij aij , (3.6)

    where c(k)ij is the i, jth element of the matrix of residuals resulting from

    partialling all factors except the kth fromC.

    The basic procedures given some starting matrix A, the elements of the

    first column are, each in turn, replaced once according to equation 16. then

    in the same way, the elements of the second column are replaced, and so

    on, until all elements of A have been replaced once. This constitutes one

    iteration cycle. This iterative procedure will be terminated when, in oneiteration cycle, the value of the function in equation 15 decreases less than

    some specified small value [26].

    3. Harmans MINRES. In minimum residual [Harman and Jones, 1966; Har-

    man and Fukuda, 1966] we project out D [8][7]. Thus we define

    Cp(A, ) =1

    2 minD

    SSQ(CAA D2) =

    1j

  • 8/10/2019 Least Squares Method for Factor Analysis

    27/72

    4. Gradient and Newton Methods. Gradient methods [de Leeuw 2010] can be

    applied by projecting out A. Thus we define

    Cp(, D) =1

    2 minA

    SSQ(C

    AA

    D2) = s=p+1

    2

    s

    (C

    D2)2.

    Now use

    Djs(D) = z2js,Djls(D) = zjs (CD2 sI)+jl,

    where zs is the normalized eigenvector corresponding with eigenvalue s

    and (C D2

    s

    )

    +

    jl is the (j

    , l

    ) element of the Moore-Penrose inverse ofCD2sI. This directly gives formulas for the first and second derivativesof the loss function.

    DjC(, D) = m

    s=p+1

    sz2js,

    DjlC(, D) =m

    s=p+1

    z2lsz

    2js 2szjszls(CD sI)+jl

    3.2 Least Squares on the Data Matrix

    3.2.1 Loss Function

    The loss function used in LSFAY is

    (H, E, A, D) =1

    2SSQ(YHA ED), (3.7)

    We minimize over H Rnp

    , E Rnm

    , A Rmp

    and D Dm

    , under theconditions that HH= I, EE = I,HE = 0 and D is diagonal.

    19

  • 8/10/2019 Least Squares Method for Factor Analysis

    28/72

    3.2.2 Projection

    The result in Appendix A can be use to define a projected version of the LSFAY

    loss function.

    (A, D) =1

    2 minH,E

    SSQ(YHA ED)

    =1

    2 SSQ(Y) +

    1

    2(A|D)

    ms=1

    s(YA|YD),

    where thes(YA|YD) are the ordered singular values of (YAYD). Note that(YA YD) is n (m + p), but its rank is less than or equal to m. Thus atleast p p of the singular values are zero.

    The singular values are the square roots of the ordered eigenvalues ofs of

    U=

    ACA ACD

    DCA DCD

    .

    Thus we can write

    (A, D) =1

    2tr(C) +

    1

    2SSQ(A) +

    1

    2SSQ(D)

    ms=1

    s(E).

    3.2.3 Algorithms

    Our approach may seem to be quite similar to the approach proposed by Paul

    Horst in his book [9]. Where we differ from Horst is in the additional assumptions

    that D is diagonal and that E has the same size as the data Y. This puts us solidly

    in the common factor analysis framework. Horst, on the contrary, only makes

    the assumption that there is a small number of common and residual factors, and

    he then finds them by truncating the singular value decomposition. Separatingcommon and unique factors is be done later by using rotation techniques. For

    Horst factor analysis is just principal component analysis with some additional

    interpretational tools.

    20

  • 8/10/2019 Least Squares Method for Factor Analysis

    29/72

    1. Alternating Least Squares. This approach to simultaneous least squares

    estimation of both loadings and unique factors was first introduced by Jan

    de Leeuw [2004], and has since then been used by Unkel and Trendafilov

    [23]; Trendafilov and Unkel [22] [16].

    The algorithm to minimize the loss function (3.2) is of the alternating least

    squares type. It is started with an initial estimate A(0) and D(0) and then

    alternate

    H(k) | E(k)

    procrustus

    YA(k) | YD(k)

    , (3.8)

    A(k+1) =YH(k), (3.9)

    D(k+1) =diag(YE(k)). (3.10)

    The Procrustus transformation of a matrix is defined in terms of its singular

    value decomposition. If the n mmatrix X has rank mand singular valuedecomposition X = KL, then we define Procrustus(X) = KL. As

    shown in the following, the Procrustus transformation as a set of matrix.

    By defining the block matrices B = [H: E] andU = [A: D] of dimensions

    n(p+m) and m(p+m), therefore, (3.2) can be rewritten as

    tr(YBU)(XBU) = tr(XX) + tr(BBUU) 2tr(BXU) , (3.11)

    which is optimized subject to a new constraint BB =In. Thetr(BBUU)

    in (3.11), can be written as

    21

  • 8/10/2019 Least Squares Method for Factor Analysis

    30/72

    tr(BBUU) =tr

    H

    E

    [H: E]

    A

    D

    [A: D]

    =tr

    Ip 0pm

    0mp Im

    AA ADDA D2

    =tr

    AA AD

    DA D2

    =tr(AA) + tr(D2)

    showing that tr(BBUU) does not depend on H and E. Hence, as with

    the standard Procrustes problem, minimizing (3.11) over B is equivalent

    to the maximization oftr(BYU). For this problem a closed-form solution

    compute from the singular value decomposition (SVD) ofYUexists [10].

    After solving the Procrustes problem for B = [H: E], one can update the

    values of AandD by A= YHandD = diag(YE) using the identities,

    HY= H(HA + ED) =HHA + HED HY= A (3.12)EY= E(HA + ED) =EHA + EED EY= D (3.13)

    which follow from the model (3.1). The alternating least squares process is

    continued until the loss function (3.2) cannot be reduced further.

    Alternatively, it can be used the Moore-Penrose inverse and then matrix

    symmetric square root, because

    Procrustus(X) = (XX)+X=X(XX)+.Ifrank(X) =r

  • 8/10/2019 Least Squares Method for Factor Analysis

    31/72

    It is important that we can use the symmetric square root to construct a

    version of the algorithm that does not depend on the number of observations

    n, and that can be applied to examples that are only given as covariance

    or correlation matrices [18]. We can combine the equations in (3.8), (3.9)

    and (3.10) to

    A(k+1) =CAk

    (U(k))+, (3.14)

    D(k+1) =diag(CD(k)

    (U(k))+). (3.15)

    This version of the algorithm no longer use Y, onlyC. It can be though of

    as an adapted version of Bauer-Rutishauser simultaneous iteration [20].

    2. Gradient and Newton Methods. Suppose the eigenvector zs of U corre-

    sponding with s, is partitioned, by putting the first p elements in vs and

    the last m elements in ws. Then [De Leeuw, 2007]

    s(U)

    ajr=

    1

    s(U)vrsc

    j(Avs+ Dws),

    s(U)djj

    = 1

    s(U)wjsc

    j(Avs+ Dws),

    where cj is column j ofC. Collecting terms gives

    D1(A, D) = A CA

    U+,

    D2(A, D) =D diag(CD

    U+),

    which shows that the alternating least squares algorithm (3.14) and (3.15)

    can be written as a gradient algorithm with constant step-size

    A(k+1) =A(k) D1(A(k), D(k)),D(k+1) =D(k) D2(A(k), D(k)).

    23

  • 8/10/2019 Least Squares Method for Factor Analysis

    32/72

    Note that the matrix on the right is

    U+, the symmetric square root of

    the Moore-Penrose inverse ofU[17].

    Convergence may not be immediately obvious. But in fact the iterations

    generated are exactly the same as those of the Procrustus algorithm, and

    thus we obtain convergence from general ALS theory. But of course now

    computations no longer depend on n, and we only need the covariance

    matrix in order to be able to compute the optimal A andD.

    24

  • 8/10/2019 Least Squares Method for Factor Analysis

    33/72

    CHAPTER 4

    Examples

    Algorithms derived in earlier chapter has been programmed and applied to a

    number of problems. In all programs the same criterion was used to stop the

    iterations, that either the loss function decrease less than 1e - 6 or iteration equal

    to 1000. In this section we can compare these solutions of applying least squares

    method to some classic data sets. Results for these data sets are given here. Five

    sets are considered: a) 9 mental tests from Holzinger and Swineford (1939); b) 9

    mental tests from Thurstone (McDonald, 1999; Thurstone & Thurstone, 1941);

    c) 17 mental tests from Thurstone and Bechtoldt (Bechtoldt, 1961); d) 14 tests

    from Holzinger and Swineford (1937); e) 9 tests from Brigham (Thurstone, 1933).

    The first data sets is included in the HolzingerSwineford1939 data set in the

    lavaan package. The last four data sets are included in the bifactordata set in

    thepsych package.

    4.1 9 Mental Tests from Holzinger-Swineford

    A small subset with 9 variables of the classic Holzinger and Swineford (1939)

    dataset which is discussed in detail by Joreskog (1969). This dataset consists of

    mental ability test scores of seventh and eighth-grade children from two different

    schools (Pasteur and Grant-White). These nine tests were grouped into three

    factors. Ten different solutions were computed, six of them we iterate until the

    25

  • 8/10/2019 Least Squares Method for Factor Analysis

    34/72

    loss function decrease less than 1e - 6.

    In table 4.1 shows the value of loss function, number of iterations and CPU

    times that expression to be timed1 of applying LSFAC, LSFAY and ML to

    Holzinger-Swinefords example.

    For the least squares factor analysis method used covariance matrix (LSFAC)

    case, in table 4.1, the principal factor analysis (PFA) beats BFGS2 and CG3

    by a factor of 10 in the user time. In addition, going from PFA to Comrey

    algorithm makes convergence 3 times faster and from PFA to Harman algorithm

    the convergence speed gains 24%. Further analysis, going from PFA algorithm

    to Newton algorithm again makes convergence 2.6 times as fast (observe that

    Newton algorithm starts with a small number of Comrey algorithm iterations to

    get into an area where quadratic approximation is safe).

    For the least squares factor analysis method used data matrix (LSFAY) case,

    the direct alternating least squares beats BFGS and CG by 50 times. Also,

    going from lsfaySVD to lsfayInvSqrt the speed gains another 10%, plus a huge

    advantage in used storage because the problem no longer depends on n and they

    converge on the same iteration steps. The PFA algorithm is 49% times faster

    than the direct alternating least squares.

    In table 4.2 we compute the LSFAC loss function (LSFACf) of applying the

    solutions from LSFAC and LSFAY algorithms and we do the same to the LSFAY

    loss function (LSFAYf). We see the results of LSFAC loss function applying

    different algorithms are almost identical, and similarly as LSFAY loss function.

    1The user time is the CPU time charged for the execution of user instructions of the calling

    process, and the system time is the CPU time charged for execution by the system on behalfof the calling process2The BFGS method approximates Newtons method, a class of optimization techniques that

    seeks a stationary point of a function.3Method CG is a conjugate gradients method based on that by Fletcher and Reeves (1964).

    26

  • 8/10/2019 Least Squares Method for Factor Analysis

    35/72

    In addition, each entry for all pairs of loading matrices comparison is 1.00, that

    verifies all algorithms are similarity to each other.

    Table 4.1: LSFA methods Summary - 9 Mental Tests from Holzinger-Swineford

    LSFAC LSFAY ML

    Newton PFA Comrey Harman BFGS CG InvSqrt SVD BFGS CG ML

    loss 0.01307 0.01307 0.01307 0.01307 0.01307 0.01307 0.0074 0.0074 0.0074 0.0074

    iteration 3.00000 57.0000 17.0000 16.0000 94.000 94.000

    user.self 0.02000 0.07200 0.01700 0.05800 0.59000 0.72700 0.1070 0.1180 6.2120 6.7300 0.085

    sys.self 0.00000 0.00000 0.00000 0.00100 0.00300 0.00400 0.0030 0.0060 0.0250 0.0290 0.002

    Table 4.2: Loss function Summary - 9 Mental Tests from Holzinger-Swineford

    LSFACf LSFAYf

    Newton 0.01306885 0.007427571

    PFA 0.01306885 0.007427550

    Comrey 0.01306885 0.007427556

    Harman 0.01306885 0.007427547

    InvSqrt 0.01313887 0.007396346

    ML 0.01333244 0.007454523

    27

  • 8/10/2019 Least Squares Method for Factor Analysis

    36/72

    Table 4.3: Loading Matrices Summary - 9 Mental Tests from Holzinger-Swineford

    LSFAC LSFAY

    Newton PFA Comrey Harman BFGS CG InvSqrt SVD BFGS CG

    Newton 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    PFA 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    Comrey 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    Harman 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    BFGS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    CG 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    InvSqrt 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00SVD 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    BFGS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    CG 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    4.2 9 Mental Tests from Thurstone

    A classic data set is the 9 variable Thurstone problem which is discussed in detail

    by R. P. McDonald (1985, 1999). These nine tests were grouped by Thurstone,

    1941 into three factors: Verbal Comprehension, Word Fluency, and Reasoning.

    The original data came from Thurstone and Thurstone (1941) but were reana-

    lyzed by Bechthold (1961) who broke the data set into two. McDonald, in turn,

    selected these nine variables from a larger set of 17. Nine different solutions were

    computed, five of them we iterate until the loss function decrease less than 1e -6.

    For the LSFAC case, in table 4.4, the PFA beats BFGS and CG by a factor

    28

  • 8/10/2019 Least Squares Method for Factor Analysis

    37/72

    of 10. In addition, going from PFA to Comrey algorithm makes convergence 1.2

    times faster and from PFA to Harman algorithm makes convergence 18% times

    faster. Further analysis, going from Comrey algorithm to Newton algorithm

    again makes convergenc 50% times as fast (observe that Newton algorithm starts

    with a small number of Comrey algorithm iterations to get into an area where

    quadratic approximation is safe). For the LSFAY case, direct alternating least

    squares beats BFGS and CG by 50 times. The PFA algorithm is twice faster

    than the direct alternating least squares.

    In table 4.5 we compute the LSFAC loss function (LSFACf) of applying the

    solutions from LSFAC and LSFAY algorithms and we do the same to the LSFAY

    loss function (LSFAYf). We see the results of LSFAC loss function applying

    different algorithms are almost identical, and similarly as LSFAY loss function.

    In addition, each entry for all pairs of loading matrices comparison is 1.00, that

    verifies all algorithms are similarity to each other.

    Table 4.4: LSFA methods Summary - 9 Mental Tests from Thurstone

    LSFAC LSFAY ML

    Newton PFA Comrey Harman BFGS CG InvSqrt BFGS CG ML

    loss 0.00123 0.00123 0.00123 0.00123 0.00123 0.00123 0.00098 0.00098 0.00098

    iteration 4.00000 66.0000 28.0000 22.0000 144.000

    user.self 0.02400 0.07700 0.03500 0.06500 0.64200 0.81500 0.14600 6.72300 7.23700 0.037

    sys.self 0.00100 0.00100 0.00000 0.00000 0.00300 0.00400 0.00000 0.03100 0.04600 0.000

    29

  • 8/10/2019 Least Squares Method for Factor Analysis

    38/72

    Table 4.5: Loss function Summary - 9 Mental Tests from Thurstone

    Newton vs InvSqrt

    LSFACf LSFAYf

    Newton 0.001228405 0.0009998733

    PFA 0.001228405 0.0009998909

    Comrey 0.001228405 0.0009998581

    Harman 0.001228405 0.0009998984

    InvSqrt 0.001266686 0.0009804552

    ML 0.0013511019 0.0009924787

    Table 4.6: Loading Matrices Summary - 9 Mental Tests from ThurstoneLSFAC LSFAY

    Newton PFA Comrey Harman BFGS CG InvSqrt BFGS CG

    Newton 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    PFA 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    Comrey 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    Harman 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    BFGS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    CG 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    InvSqrt 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    BFGS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    CG 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    4.3 17 mental Tests from Thurstone/Bechtoldt

    This set is the 17 variables from which the clear 3 factor solution used by McDon-

    ald (1999) is abstracted. Nine different solutions were computed, five of them we

    30

  • 8/10/2019 Least Squares Method for Factor Analysis

    39/72

    iterate until the loss function decrease less than 1e - 6.

    For the LSFAC case, in table 4.7, the PFA beats BFGS and CG by a factor

    of 57. In addition, going from Comrey to PFA algorithm makes convergence 30%

    times faster, and going from Harman to PFA algorithm makes convergence 2.8

    times faster. Further analysis, going from Comrey algorithm to Newton algorithm

    again makes convergenc 30% times as fast (observe that Newton algorithm starts

    with a small number of Comrey algorithm iterations to get into an area where

    quadratic approximation is safe). For the LSFAY case, the direct alternating

    least squares beats BFGS and CG by 23 times. The PFA algorithm is twice

    faster than the direct alternating least squares.

    In table 4.8 we compute the LSFAC loss function (LSFACf) of applying the

    solutions from LSFAC and LSFAY algorithms and we do the same to the LSFAY

    loss function (LSFAYf). We see the results of LSFAC loss function applying

    different algorithms are almost identical, and similarly as LSFAY loss function.

    In addition, each entry for all pairs of loading matrices comparison is 1.00, that

    verifies all algorithms are similarity to each other.

    Table 4.7: LSFA methods Summary - 17 mental Tests from Thurstone/Bechtoldt

    LSFAC LSFAY ML

    Newton PFA Comrey Harman BFGS CG InvSqrt BFGS CG ML

    loss 0.496 0.496 0.496 0.496 0.496 0.496 0.187 0.187 0.187

    iteration 3.000 24.00 18.00 17.00 43.00

    user.self 0.030 0.030 0.039 0.115 1.500 1.700 0.068 1.608 2.194 0.042

    sys.self 0.001 0.00 0.001 0.000 0.006 0.008 0.000 0.0070 0.001 0.001

    31

  • 8/10/2019 Least Squares Method for Factor Analysis

    40/72

    Table 4.8: Loss function Summary - 17 mental Tests from Thurstone/Bechtoldt

    LSFACf LSFAYf

    Newton 0.4959954 0.1916967

    PFA 0.4959954 0.1916968

    Comrey 0.4959954 0.1916968

    Harman 0.4959954 0.1916968

    InvSqrt 0.5071997 0.1873351

    ML 0.5300513 0.1929767

    Table 4.9: Loading Matrices Summary - 17 mental Tests from Thurstone

    LSFAC LSFAYNewton PFA Comrey Harman BFGS CG InvSqrt BFGS CG

    Newton 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    PFA 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    Comrey 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    Harman 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    BFGS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    CG 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    InvSqrt 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    BFGS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    CG 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    4.4 16 Health Satisfaction items from Reise

    The Reise data set is a correlation matrix based upon>35,000 observations to the

    consumer Assessment of Health Care Providers and System survey instrument.

    Reise, Morizot, and Hays (2007) describe a bifactor solution based upon 1,000

    32

  • 8/10/2019 Least Squares Method for Factor Analysis

    41/72

    cases. The five factors from Reise et al. reflect Getting care quickly (1-3), Doctor

    communicates well (4-7), Courteous and helpful staff (8, 9), Getting need care

    (10-13), and Health plan customer service (14-16). In all LSFAC case we iterate

    until the loss function decrease less than 1e - 6, and in LSFAY case we iterate

    until 1000 iterations.

    For the LSFAC case, in table 4.10, the PFA beats BFGS and CG by a factor

    of 5. In addition, going from Comrey to PFA algorithm makes convergence 13%

    times faster, and going from Harman to PFA algorithm makes convergence 7%

    times faster. Further analysis, going from Comrey algorithm to Newton algorithm

    again makes convergenc 2.7 times as fast (observe that Newton algorithm starts

    with a small number of Comrey algorithm iterations to get into an area where

    quadratic approximation is safe). For the LSFAY case, the direct alternating

    least squares beats BFGS and CG by 20 times. The PFA algorithm is 44% times

    faster than the direct alternating least squares.

    In table 4.11 we compute the LSFAC loss function (LSFACf) of applying the

    solutions from LSFAC and LSFAY algorithms and we do the same to the LSFAY

    loss function (LSFAYf). We see the results of LSFAC loss function applyingdifferent algorithms are almost identical, and similarly as LSFAY loss function.

    In addition, each entry for all pairs of loading matrices comparison is 1.00, that

    verifies all algorithms are similarity to each other.

    33

  • 8/10/2019 Least Squares Method for Factor Analysis

    42/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    43/72

    4.5 9 Emotional variable from Burt

    The Burt nine emotional variables are taken from Harman (1967, p 164) who in

    turn adapted them from Burt (1939). they are said be from 172 normal childrenaged nine to twelve. As pointed out by Harman, this correlation matrix is singular

    and has squared multiple correlations > 1. Note this correlation matrix has a

    negative eigenvalue, but the LSFAY still works. In all LSFAC case we iterate

    until the loss function decrease less than 1e - 6, and in LSFAY case we iterate

    until 1000 iterations.

    For the LSFAC case, in table 4.13, simple the PFA beats BFGS and CG by a

    factor of 8. In addition, going from PFA to Comrey algorithm makes convergence

    1.6 times faster, and going from Harman to PFA algorithm makes convergence

    0.02 times faster. Further analysis, going from Comrey algorithm to Newton al-

    gorithm again makes convergenc 1.4 times faster (observe that Newton algorithm

    starts with a small number of Comrey algorithm iterations to get into an area

    where quadratic approximation is safe). For the LSFAY case, the direct alter-

    nating least squares beats BFGS and CG by 13 times. The PFA algorithm is 6

    times faster than the direct alternating least squares.

    In table 4.14, each entry for all pairs of loading matrices comparison is 1.00,

    that verifies all algorithms are similarity to each other.

    35

  • 8/10/2019 Least Squares Method for Factor Analysis

    44/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    45/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    46/72

    the square root of the estimating unique variance are always result positive.

    The illustrative results verify that an common factor analysis solution can

    be surprisingly similar to the classical maximum likelihood and least squares

    solutions on those data set we analysis earlier, suggestion that further research

    into its properties may be of interest in the future.

    Although we showed that the proposal factor analysis methodology can yield

    results that are equivalent to those from standard methods, an important ques-

    tion to consider s whether any variants of this methodology actually can yield

    improvements over existing methods. If not, results will be of interest mainly in

    providing a new theoretical perspective on the relations between components and

    factor analysis.

    38

  • 8/10/2019 Least Squares Method for Factor Analysis

    47/72

    APPENDIX A

    Augmented Procrustus

    Suppose X is an n m matrix of rank r. Consider the problem of maximizingtr(UX) over the n m matrices U satisfying UU = I. This is known as theProcrustusproblem, and it is usually studied for the case n m= r . We wantto generalize to n m r. For this, we use the singular value decomposition

    X=

    K1nr

    K0n(nr)

    rr 0r(mr)0

    (nr)r0

    (nr)(mr)

    L1rm

    L0(mr)m

    .

    Theorem 1. The maximum of tr UX overnmmatricesU satisfyingUU= Iis tr , and it is attained for any U of the from U = K1L

    1+ K0V L

    0, where V

    is any(n r) (m r) matrix satisfyingVV = I.

    Proof. Using a symmetric matrix of Lagrange multipliers leads to the station-

    ary equations X = UM, which implies XX = M2 or M =(XX 12 ). It alsoimplies that at a solution of the stationary equation tr UX= tr . The neg-ative sign corresponds with the mimimum, the positive sign with the maximum.

    Now

    M=

    L1mr

    L0m(mr)

    rr 0r(mr)0

    (mr)r0

    (mr)(mr)

    L1rm

    L0(mr)m

    .

    39

  • 8/10/2019 Least Squares Method for Factor Analysis

    48/72

    If we write U in the form

    U=

    K1nr

    K0n(nr)

    U1rm

    U0(nr)m

    then X= UM can be simplified to

    U1L1= I,

    U0L1= 0.

    with in addition, of course, U1U1+ U0U0= I. It follows that U1 = L

    1 and

    U0(nr)m

    = V(nr)(mr)

    L0(mr)m

    ,

    with VV =I. Thus U= K1L1+ K0VL

    0.

    40

  • 8/10/2019 Least Squares Method for Factor Analysis

    49/72

    APPENDIX B

    Implementation Code

    B.1. Examples Dataset

    1 l i b rary (MASS)

    l i b rary ( p s y c h )

    3 l i b rary (op timx )

    l i b rary ( s e r i a t i o n )

    5 data (Harman)

    data ( b i f a c t o r )

    7 data (Ps y c h 24)

    9 haho

  • 8/10/2019 Least Squares Method for Factor Analysis

    50/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    51/72

    43 1 4 , 1 2 , 1 2 , 1 1 , 1 2 , 1 0 , 1 0 , 1 2 , 1 1 , 1 2 , 1 4 , 1 4 , 1 3 , 1 4 , 1 3 ,

    16 ,

    1 4 , 1 6 , 1 3 , 2 , 1 4 , 1 7 , 1 6 , 1 5 , 1 2 , 1 4 , 1 3 , 1 1 , 7 , 1 2 , 6 ) ,

    p a pe r f or m b o ar d s = c (17 ,

    45 1 5 , 1 4 , 1 2 , 1 7 , 2 1 , 1 3 , 5 , 7 , 1 5 , 1 7 , 2 0 , 1 5 , 1 9 , 1 8 , 1 4 , 1 7 ,

    1 4 , 2 1 , 2 1 , 1 7 , 1 6 , 1 5 , 1 3 , 1 3 , 1 8 , 1 5 , 1 6 , 1 9 , 1 6 , 2 0 , 1 9 ,

    14 ,

    47 1 2 , 1 9 , 1 3 , 2 0 , 9 , 1 3 , 8 , 2 0 , 1 0 , 1 8 , 1 8 , 1 0 , 1 6 , 8 , 1 6 , 2 1 ,

    1 7 , 1 6 , 1 6 , 6 , 1 6 , 1 7 , 1 3 , 1 4 , 1 0 , 1 7 , 1 5 , 1 6 , 7 , 1 5 , 5 ) ,

    t o o l r e co g n it i o n = c (24 ,

    49 3 2 , 2 9 , 1 0 , 2 6 , 2 6 , 2 6 , 2 2 , 3 0 , 3 0 , 2 6 , 2 8 , 2 9 , 3 2 , 3 1 , 2 6 ,33 ,

    1 9 , 3 0 , 3 4 , 3 0 , 1 6 , 2 5 , 2 6 , 2 3 , 3 4 , 2 8 , 2 9 , 3 2 , 3 3 , 2 1 , 3 0 ,

    12 ,

    51 1 4 , 2 1 , 1 0 , 1 6 , 1 4 , 1 8 , 1 3 , 1 9 , 1 1 , 2 5 , 1 3 , 2 5 , 8 , 1 3 , 2 3 , 2 6 ,

    1 4 , 1 5 , 2 3 , 1 6 , 2 2 , 2 2 , 1 6 , 2 0 , 1 2 , 2 4 , 1 8 , 1 8 , 1 9 , 7 , 6 ) ,

    v o c ab u l a r y = c (14 ,

    53 2 6 , 2 3 , 1 6 , 2 8 , 2 1 , 2 2 , 2 2 , 1 7 , 2 7 , 2 0 , 2 4 , 2 4 , 2 8 , 2 7 , 2 1 ,

    26 ,

    1 7 , 2 9 , 2 6 , 2 4 , 1 6 , 2 3 , 1 6 , 2 1 , 2 4 , 2 7 , 2 4 , 2 3 , 2 3 , 2 1 , 2 8 ,

    21 ,

    55 2 6 , 2 1 , 1 6 , 1 6 , 1 8 , 2 4 , 2 3 , 2 3 , 2 7 , 2 5 , 2 6 , 2 8 , 1 4 , 2 5 , 2 8 ,

    26 ,

    1 4 , 2 3 , 2 4 , 2 1 , 2 6 , 2 8 , 1 4 , 2 6 , 9 , 2 3 , 2 0 , 2 8 , 1 8 , 2 8 , 1 3) ) , .

    Names = c ( "gender",

    57 "pictorial absurdities", "paper form boards", "tool

    recognition",

    "vocabulary" ) , row . names = c (NA, 64L) , c l a s s = "data.frame" )

    43

  • 8/10/2019 Least Squares Method for Factor Analysis

    52/72

    59

    # b e a l l a r e s c o r e s o f 32 men and 32 women on 4 t e s t s t ak en

    f ro m B e a l l ( P sy ch om et ri ka , 1 9 45 )

    61

    data ( H o l z i n g e r S w i n e f o r d 1 9 3 9 , package="lavaan" )

    63

    HS.name

  • 8/10/2019 Least Squares Method for Factor Analysis

    53/72

    APPENDIX C

    Program

    C.1. Main.

    1 l s f a

  • 8/10/2019 Least Squares Method for Factor Analysis

    54/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    55/72

    }41 ch g

  • 8/10/2019 Least Squares Method for Factor Analysis

    56/72

    newa [ i , s ]

  • 8/10/2019 Least Squares Method for Factor Analysis

    57/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    58/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    59/72

    newa [ i , s ]

  • 8/10/2019 Least Squares Method for Factor Analysis

    60/72

    ee

  • 8/10/2019 Least Squares Method for Factor Analysis

    61/72

    r e s u l t

  • 8/10/2019 Least Squares Method for Factor Analysis

    62/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    63/72

    237 i f ( ver bos e ) {ca t ( "Iteration: ", formatC ( i t e l , d i g i t s =4 , w i d th =6) ,

    239 "Change: ", formatC (c h g , d i g i t s = 6,wid th =10,

    format="f" ) ,

    "\n" )

    241 ca t ( " o ld d : ", formatC ( old d , d i g i t s = 6,wid th =10,

    format="f " ) , "\n" )

    ca t ( " n ew d : ", formatC (n ewd , d i g i t s = 6,wid th =10,

    format="f " ) , "\n\n" )

    243 }

    i f ( ( chg < eps ) | | ( i t e l == itmax) ) {245 break

    }247 i t e l

  • 8/10/2019 Least Squares Method for Factor Analysis

    64/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    65/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    66/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    67/72

  • 8/10/2019 Least Squares Method for Factor Analysis

    68/72

    55 i f(fm == "LSFAY" ){a

  • 8/10/2019 Least Squares Method for Factor Analysis

    69/72

    79 k

  • 8/10/2019 Least Squares Method for Factor Analysis

    70/72

    Bibliography

    [1] T. W. Anderson and Herman Rubin. Statistical inference in factor analysis.

    Proceedings of the Third Berkeley Symposium of Mathematical Statistics and

    Probability, 5:111150, 1956.

    [2] T.W. Anderson. Estimating linear statistical relationships. The Annals of

    Statistics, 12(1):145, 1984.

    [3] M. S. Bartlett. Tests of significance in factor analysis. British Journal of

    Psychology, Statistical Section, 8:7785, 1950.

    [4] C. Chatfield and A. J. Collins. Introduction to Multivariate Analysis. Chap-

    man and Hall, New Jersey, 1980.

    [5] A. L. Comrey. The minimum residual method of factor analysis. Psycholog-

    ical Records, 11:1518, 1962.

    [6] Richard L. Gorsuch. Factor Analysis. Lawrence Erlbaum Associates, New

    Jersey, 1983.

    [7] H. H. Harman and Y. Fukuda. Resolution of the heywood case in the minres

    solution. Psychometrika, 31:563571, 1966.

    [8] H. H. Harman and W. H. Jones. Factor analysis by minimizing residuals

    (minres). Psychometrika, 31:351368, 1966.

    [9] P. Horst. Factor analysis of data matrices. Holt, Rinehart and Winston,

    1965.

    [10] G.B. Dijksterhuis J.C. Gower. Procrustes problems.Oxford University Press,

    pages 121134, 2004.

    62

  • 8/10/2019 Least Squares Method for Factor Analysis

    71/72

    [11] K. G. Joreskog. On the statistical treatment of residuals in factor analysis.

    Psychometrika, 27:435454, 1962.

    [12] K. G. Joreskog. Some contribution to maximum likelihood factor analysis.

    Psychometrika, 32:443482, 1967.

    [13] Richard A. Johnson and Dean W. Wichern. Applied Multivariate Statistical

    Analysis. Prentice Hall, New Jersey, 2002.

    [14] D. N. Lawley. The estimation of factor loadings by the method of maximum

    likelihood. Proc. R. Soc. Edinburgh, 60:6482, 1940.

    [15] D. N. Lawley. Further investigations in factor estimation. Proceeding of the

    Royal Society, Edinburgh, 62:176185, 1942.

    [16] J. De Leeuw. Least squares optimal scaling of partially observed linear

    systems. Recent Developments on Structural Equation Models: Theory and

    Applications, pages 121134, 2004.

    [17] J. De Leeuw. Derivatives of generalized eigen systems with applications.

    Preprint 528, Department of Statistics, UCLA, 2007.

    [18] J. De Leeuw. Least squares methods for factor analysis. Preprint, Depart-

    ment of Statistics, UCLA, 2010.

    [19] Eni Pan Gao and Zhiwei Ren. Introduction to Multivariate Analysis for the

    social sciences. W.H. Freeman and Company, New Jersey, 1971.

    [20] H. Rutishauser. Computational aspects of f.l. bauers simultaneous iteration

    method. Numerische Mathematik, 13:413, 1969.

    [21] L. L. Thurstone. Multiple-factor analysis. Univ. Chicago Press, IV:535,

    1947.

    63

  • 8/10/2019 Least Squares Method for Factor Analysis

    72/72

    [22] Steffen Unkel and Nickolay T. Trendafilov. Noisy independent component

    analysis as a method of rotating the factor scores. Springer-Verlag Berlin

    Heidelberg, 2007.

    [23] Steffen Unkel and Nickolay T. Trendafilov. Factor analysis as data matrix

    decomposition: A new approach for quasi-sphering in noisy ica. Springer-

    Verlag Berlin Heidelberg, pages 163170, 2009.

    [24] P. Whittle. On principal components and least squares methods in factor

    analysis. Skandinavisk Aktuarietidskrift, 35:223239, 1952.

    [25] Gale Young. Maximum likelihood estimation and factor analysis. Psychome-

    trika, 6:4953, 1941.

    [26] F. E. Zegers and J. M. F. Ten Berge. A fast and simple computational meth-

    ods of minimum residual factor analysis. Multivariate Behavioral Research,

    18:331340, 1983.