pca

30
  NATIONAL CHENG KUNG UNIVERSITY Principal Component Analysis Final Paper in Financial Pricing Tuan Anh Sander Mägi 6/17/2009 [Type the abstract of  the document here. The abstract is typically a short summary of  the contents of  the document. Type the abstract of  the document here. The abstract is typically a short summary of  the contents of  the document.] 

Upload: manideep-rapeti

Post on 04-Nov-2015

6 views

Category:

Documents


0 download

TRANSCRIPT

  • NATIONALCHENGKUNGUNIVERSITY

    PrincipalComponentAnalysis

    FinalPaperinFinancialPricing

    TuanAnh

    SanderMgi

    6/17/2009

    [Typetheabstractofthedocumenthere.Theabstractistypicallyashortsummaryofthecontentsofthedocument.Typetheabstractofthedocumenthere.Theabstractistypicallyashortsummaryofthecontentsofthedocument.]

  • 2

    Table of Contents

    Table of Contents............................................................................................................................ 2

    Chapter I Introduction..................................................................................................................... 3

    Chapter II Literature review............................................................................................................ 4

    2.1 What is PCA ......................................................................................................................... 4

    2.1.1 Definition of PCA.......................................................................................................... 4

    2.1.2 History of PCA .............................................................................................................. 4

    2.1.3 Basic assumptions.......................................................................................................... 5

    2.1.4 Important concepts......................................................................................................... 6

    2.1.5 Calculating principal components.................................................................................. 6

    2.1.6 Deriving principal components...................................................................................... 6

    2.2 Advantages and disadvantages of PCA ................................................................................ 9

    2.2.1 Importance of PCA ........................................................................................................ 9

    2.2.2 Benefits of PCA ........................................................................................................... 10

    2.2.3 Limitations of PCA................................................................................................. 11

    2.3 Practical implications - Software ................................................................................... 12

    Chapter III Applications ............................................................................................................... 14

    Chapter IV Conclusions................................................................................................................ 26

    List of Articles .............................................................................................................................. 27

    References..................................................................................................................................... 29

  • 3

    Chapter I Introduction

    When starting a research students as well as researchers often collect a lot of data or sometimes

    come across large datasets that are available. But when having lots of data, especially when it is

    secondary data, it is often very easy to get confused. It is hard to find the variables that are really

    important for the research when there are so many variables to consider. This is where principal

    components analysis (PCA) can help.

    Principal Components Analysis (PCA) was invented by Karl Pearson in 1901 and is now used in

    many fields of science. PCA is mostly used as a tool in exploratory data analysis because what it

    essentially does it to find the most important variables (a combination of them) that explain most

    of the variance in the data. So, when there is lots of data to be analyzed, PCA can make the task a

    lot easier. PCA also helps to construct predictive models.

    In this paper we are going to focus on applications of PCA in finance research. Earlier

    applications of PCA in finance date back to early 1970s, while there are many articles from 2009

    that used PCA. PCA is also often used in combination with other methods.

    In chapter II we are going to first explain what PCA is and how it works. We are also going to

    discuss the advantages and limitations as well as the importance of PCA. We are doing this by

    reviewing some relevant literature. In chapter III we are continuing our literature review and

    focus on the applications of PCA. Chapter IV concludes our overview of PCA.

  • 4

    So, when having this big pile of data and having decided to use PCA to find the most important

    variables, what do we need to do now? We need to understand PCA and learn how to apply it.

    This is what the next section of this paper focuses on.

    Chapter II Literature review

    2.1 What is PCA

    2.1.1 Definition of PCA

    PCA is known a Principle Component Analysis this is a statistical analytical tool that is used to

    explore, sort and group data. What PCA does is take a large number of correlated (interrelated)

    variables and transform this data into a smaller number of uncorrelated variables (principal

    components) while retaining maximal amount of variation, thus making it easier to operate the

    data and make predictions. Or as Smith (2002) puts it PCA is a way of identifying patterns in

    data, and expressing the data in such a way as to highlight their similarities and differences.

    Since patterns in data can be hard to find in data of high dimension, where the luxury of

    graphical representation is not available, PCA is a powerful tool for analyzing data.

    2.1.2 History of PCA

    According to Jolliffe (2002) it is generally accepted that PCA was first described by Karl

    Pearson in 1901. In his article On lines and planes of closest fit to systems of points in space,

    Pearson (1901) discusses the graphical representation of data and lines that best represent the

    data. He concludes that The best-fitting straight line to a system of points coincides in direction

  • 5

    with the maximum axis of the correlation ellipsoid. He also states that the analysis used in his

    paper can be applied to multiple variables.

    However, PCA was not widely used until the development of computers. It is not really feasible

    to do PCA by hand when number of variables is greater than four, but it is exactly for larger

    amount of variables that PCA is really useful, so the full potential of PCA could not be used until

    after the spreading of computers (Jolliffe, 2002).

    According to Jolliffe (2002) significant contributions to the development of PCA were made by

    Hotelling (1933) and Girshick (1936; 1939) before the expansion in the interest towards PCA. In

    1960s. as the interest in PCA rose, important contributors were Anderson (1963) with a

    theoretical discussion, Rao (1964) with numerous new ideas concerning uses, interpretations and

    extensions of PCA, Gower (1966) with discussion about links between PCA and other statistical

    techniques and Jeffers (1967) with a practical application in two case studies.

    2.1.3 Basic assumptions

    According to Shlens (2009) there are three basic assumptions behind PCA that need to be

    considered when calculating and interpreting principal components:

    1) Linearity - Linearity frames the problem as a change of basis. Several areas of research

    have explored how extending these notions to nonlinear regimes.

    2) Large variances have important structure - This assumption also encompasses the belief

    that the data has a high SNR. Hence, principal components with larger associated

    variances represent interesting structure, while those with lower variances represent noise.

    Note that this is a strong, and sometimes, incorrect assumption.

    3) The principal components are orthogonal - This assumption provides an intuitive

    simplification that makes PCA soluble with linear algebra decomposition techniques.

  • 6

    2.1.4 Important concepts

    y Principal component - a linear combination of the original variables (1st principal

    component explains most of the variation n the data, 2nd PC explains most of the rest of

    the variance and so on)

    y Eigenvectors - the coefficients of the original variables used to construct factors

    y Eigenvalue - a corresponding scalar value for each eigenvector of a linear transformation

    2.1.5 Calculating principal components

    Jolliffe (2002) states that principal components (PCs) can be found using purely mathematical

    arguments they are given by an orthogonal linear transformation of a set of variables

    optimizing a certain algebraic criterion.

    Shlens 2009) gives an overview how to perform principal components analysis:

    1. Organize data as an mn matrix, where m is the number of measurement types and n is the

    number of samples

    2. Subtract off the mean for each measurement type

    3. Calculate covariance matrix

    4. Calculate the eigenvectors and eigenvalues of the covariance matrix

    2.1.6 Deriving principal components

    The following is a standard derivation of principal components presented by Jolliffe (2002).

    To derive the form of the PCs, consider first 1x; the vector a1 maximizes

    . It is clear that, as it stands, the maximum will not be achieved for

  • 7

    finite 1 so a normalization constraint must be imposed. The constraint used in the derivation is

    11 = 1, that is, the sum of squares of elements of 1 equals 1. Other constraints may be more

    useful in other circumstances, and can easily be substituted later on. However, the use of

    constraints other than 11 = constant in the derivation leads to a more difficult optimization

    problem, and it will produce a set of derived variables different from the principal components.

    To maximize subject to 11 = 1, the standard approach is to

    use the technique of Lagrange multipliers.

    Maximize where is a Lagrange multiplier. Differentiation with respect to 1 gives or

    Where Ip is the (p x p) identity matrix. Thus, is an eigenvalue of and 1 is the

    corresponding eigenvector. To decide which of the p

    eigenvectors gives 1x with maximum variance, note that the quantity to be maximized is

    so must be as large as possible. Thus, 1 is the eigenvector corresponding to the largest

    eigenvalue of , and , the largest eigenvalue.

    In general, the kth PC of x is akx and , where k is the kth largest eigenvalue of

    , and k is the corresponding eigenvector.

    Shlens (2009) derives an algebraic solution to PCA based on an important property of

    eigenvector decomposition. Once again, the data set is X, an mn matrix, where m is the

  • 8

    number of measurement types and n is the number of samples. The goal is summarized as

    follows:

    Find some orthonormal matrix P in Y = PX such that is a diagonal matrix. The rows

    of P are the principal components of X.

    He begins by rewriting CY in terms of the unknown variable.

    Note that they have identified the covariance matrix of X in the last line.

    The plan is to recognize that any symmetric matrix A is diagonalized by an orthogonal matrix of

    its eigenvectors. For a symmetric matrix A => A=EDET , where D is a diagonal matrix and E is a

    matrix of eigenvectors of A arranged as columns.

    Now comes the trick. They select the matrix P to be a matrix where each row pi is an eigenvector

    of . By this selection, . With this relation and A (P1 = PT) we can finish

    evaluating CY.

  • 9

    It is evident that the choice of P diagonalizes CY. This was the goal for PCA. We can summarize

    the results of PCA in the matrices P and CY.

    The principal components of X are the eigenvectors of

    The ith diagonal value of CY is the variance of X along pi.

    In practice computing PCA of a data set X entails (1) subtracting off the mean of each

    measurement type and (2) computing the eigenvectors of CX.

    2.2 Advantages and disadvantages of PCA

    2.2.1 Importance of PCA

    Principal component analysis (PCA) is a standard tool in modern data analysis - in diverse fields

    from neuroscience to computer graphics - because it is a simple, non-parametric method for

    extracting relevant information from confusing data sets. With minimal effort PCA provides a

    roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes

    hidden, simplified structures that often underlie it. (Shlens, 2009)

    Importance of PCA is manifested by its use in so many different fields of science and life. PCA

    is very much used in neuro-science, for example. Another fields of use are pattern recognition

    and image compression, therefore PCA is suited for use in facial recognition software for

  • 10

    example, as well as for recognition and storing of other biometric data. Many IT related fields

    also use PCA, even artificial intelligence research. According to Jolliffe (2002) PCA is also used

    in research of agriculture, biology, chemistry, climatology, demography, ecology, food research

    (?), genetics, geology, meteorology, oceanography, psychology, quality control, etc. But in this

    paper we are going to focus more on uses in finance and economy.

    PCA has been used in economics and finance to study changes in stock markets, commodity

    markets, economic growth, exchange rates, etc. Earlier studies were done in economics, but

    stock markets were also under research already in 1960s. Lessard (1973) claims that principal

    component or factor analysis have been used in several recent empirical studies (Farrar [1962],

    King [1967], and Feeney and Hester [1967]) concerned with the existence of general movements

    in the returns from common stocks. PCA has mostly been used to compare different stock

    markets in search for diversification opportunities, especially in earlier studies like the ones by

    Makridakis (1974) and by Phillipatos et al.(1983).

    2.2.2 Benefits of PCA

    PCA is a special case of Factor Analysis that is highly useful in the analysis of many time series

    and the search for patterns of movement common to several series (true factor analysis makes

    different assumptions about the underlying structure and solves eigenvectors of a slightly

    different matrix). This approach is superior to many of the bivariate statistical techniques used

    earlier, in that it explores the interrelationships among a set of variables caused by common

    "factors," mostly economic in nature. (Philippatos, Christofi, & Christofi, 1983)

    PCA is a way of identifying patterns in data, and expressing the data in such a way as to

    highlight their similarities and differences. A primary benefit of PCA arises from quantifying the

  • 11

    importance of each dimension for describing the variability of a data set(Shlens, 2009). PCA can

    also be used to compress the data, by reducing the number of dimensions, without much loss of

    information.

    When using principal component analysis to analyze a data set, it is usually possible to explain a

    large percentage of the total variance with only a few components. Principal components are

    selected so that each successive one explains a maximum of the remaining variance, the first

    component is selected to explain the maximum proportion of the total variance, the second to

    explain the maximum of the remaining variance, etc. Therefore, the principal component

    solution is a particularly appropriate test for the existence of a strong market factor. (Lessard,

    1973).

    PCA is completely nonparametric: any data set can be plugged in and an answer comes out,

    requiring no parameters to tweak and no regard for how the data was recorded. From one

    perspective, the fact that PCA is non-parametric (or plug-and-play) can be considered a positive

    feature because the answer is unique and independent of the user.

    2.2.3 Limitations of PCA

    Limitations in PCA occur mainly due to the previously mentioned main assumptions and the data

    at hand. PCA is not a statistical method from the viewpoint that there is no probability

    distribution specified for the observations. Therefore it is important to keep in mind that PCA

    best serves to represent data in simpler, reduced form.

    It is often difficult, if not impossible, to discover the true economic interpretation of PCs since

    the new variables are linear combinations of the original variables. In addition, for PCA to work

  • 12

    exactly, one should use standardized data so that the mean is zero and the unbiased estimate of

    variance is unity:

    Where zi =ith standardized variable.

    This is because it is often the case that the scales of the original variables are not comparable and that

    (those) variable (variables) with high absolute variance will dominate the first principal component.

    There is one major drawback to standardization, however. Standardizing means that PCA results

    will come out with respect to standardized variables. This makes the interpretation and further

    applications of PCA results even more difficult. (Malava, 2006)

    The mission when using PCA is often to get rid of correlation and interdependence of variables.

    PCA succeeds in getting rid of second order dependences, but it has trouble with higher-order

    dependencies. This problem might be solved by using kernel PCA or independent component

    analysis. The fact that PCA is agnostic to the source of the data is also a weakness.(Shlens, 2009)

    2.3 Practical implications - Software

    When searching for principal components analysis software on the internet, there are numerour

    vendors offering their services ans well as freeware packages available for users who prefer not

    to pay. With the help of Wikipedia and Google searches we come out with this list of software

    for PCA.

    "ViSta: The Visual Statistics System" is free software that provides principal components

    analysis, simple and multiple correspondence analysis. "Spectramap" is software to create a

    biplot using principal components analysis, correspondence analysis or spectral map analysis.

    Other software packages with PCA include Computer Vision Library, Multivariate Data

    Analysis Software, MVSP, The Unscrambler, PCA/X and many others.

  • 13

    It is also possible to find PCAs using MS Excel, but this requires purchacing of add-in software

    called XLSTAT.

    In MATLAB, the functions "princomp" and "wmspca" give the principal components, while the

    function "pcares" gives the residuals and reconstructed matrix for a low-rank PCA

    approximation. While in Octave, the free software equivalent to MATLAB, the function

    princomp gives the principal component.

    In the open source statistical package R, the functions "princomp" and "prcomp" can be used for

    principal component analysis; prcomp uses singular value decomposition which generally gives

    better numerical accuracy, while "spm" is a generic package developed in R for multivariate

    projection methods that allows principal components analysis.

    In XLMiner, the Principles Component tab can be used for principal component analysis. In IDL,

    the principal components can be calculated using the function pcomp. Weka computes principal

    components (javadoc).

  • 14

    Chapter III Applications

    Principal Components Analysis (PCA) can be applied to both frequency and time domain, real

    and complex data, Spectral analysis quantify MRS data. It is also be used to find image pattern,

    find common features of facial image of human being and image impression. But in this final

    report we will concentrate more on the application to finance.

    In the article principle components analysis for correlated curves and seasonal commodities:

    The case of the petroleum market. To find the volatility functions they analyzed the principal

    components of the correlation matrix of the historical returns. This methodology will ultimately

    allow us to capture the variance of the multiple-curve market with the minimum number of

    factors (which will lead to a less computationally intensive model).

    It is reasonable to expect that the principal components of any single market behave similarly to

    what was shown by Cortazar and Schwartz (1994). That is, one would look for a parallel shift

    first, then for changes in slope and curvature and expect these to explain a large proportion of the

    futures volatilities. This is because the futures contracts are positively correlated, and the

    correlation declines with the difference in maturity. Thus, a joint move will tend to be more

    important than a separating move of the same frequency, and a low-frequency move will tend to

    be more important than a higher frequency one. The mathematics is worked out in Forzani and

    Tolmasky (2001).

    The main question we try to answer in this section is how the results of the PCA differ when we

    build a model for a commodity that experiences seasonality. If we analyze the explanatory power

    of each of the principal components in the case of crude we find that it is fairly stable across

  • 15

    trading periods. Due to seasonality effects, we can guess that this will not be the case in the case

    of heating oil.

    First, note that, as one would expect, the factor pattern for the heating oil is remarkably similar to

    that of the crude oil. Similarly, 95.80% of the total variance is due to changes in the level,

    99.02% is explained by the level and slope, and 99.63% by the first three factors. This is to

    be expected given that the heating oils correlation matrix is stereotypical of many commodity

    markets. The factor pattern is remarkably similar to that of the crude oil.

    Crude oil: relative importance of the first four factors by season. Are these seasonal differences

    statistically significant? Although some results on hypothesis testing in PCA models are

    available in the literature, we are not aware of any work on the sampling distribution of the ratio

    of the first eigenvalue of a correlation matrix to the sum of the n largest. Overall, the complexity

    of the PCA results has increased tremendously in making a small step from a one-commodity to

    a two commodity setup.

    Another application of our results is pricing correlation-dependent options on petroleum products.

    The PCA is helpful if, first, the options payoff depends on correlations between many different

    curve points and/or curves and, second, the option will be priced by Monte Carlo simulation.

    Under these circumstances, the PCA provides a valuable dimensionality reduction for the Monte

    Carlo.

    PCA is also widely used to study the co-movement patterns of national equity markets. We apply

    PCA to each subperiod separately to study the changes in the co-movement patterns of the U.S

    and the four Latin American equity markets. The correlation coefficient measures the extent to

    which two statistical series move together. PCA, a multivariate statistical technique, is a useful

  • 16

    tool to analyze patterns of co-movement common to several series. In the paper Co-movements

    of U.S. and Latin American equity markets before and after the 1987 crash PCA is applied to

    each of the three subperiods to study the changes in the co-movement patterns of the five equity

    markets between the subperiods. Using Kaisers significance rule, principal components with

    eigenvalues greater than unity are retained for analysis. Kaisers varimax rotation is used for an

    easier interpretation of the principal components. The highest factor loadings in each principal

    component are marked with an asterisk.

    February 1984September 1987 (Period I)

    For Period I, three principal components with eigenvalues greater than unity are retained for

    analysis. The Mexican and Brazilian equity markets have the highest factor loadings in the first

    principal component. This principal component explains 28.6% of the total variation in the index

    returns data matrix. Since the Brazilian equity market is negatively correlated with the Mexican

    equity market in this period, it has a negative factor loading in the first principal component.

    The Chilean and U.S. equity markets have the highest factor loadings in the second principal

    component. This principal component explains 24.8% of the total variation in the Table 5 index

    returns data matrix. The first two principal components together explain 52.9% of the total

    variation in the index returns data matrix.

    The Argentine equity market dominates the third principal component. This principal component

    explains 20.1% of the total variation in the index returns data matrix. The U.S. equity market

    also has a high factor loading in this principal component. However, since the U.S. equity market

    is negatively correlated with the Argentine equity market in this period, it has a negative factor

  • 17

    loading in the third principal component. All three principal components together explain 73.0%

    of the total variation in the index returns data matrix.

    November 1987June 1991 (Period II)

    There are only two statistically significant principal components in Period II, as compared with

    three statistically significant principal components in Period I. This implies that the co-

    movements of the five equity markets were closer after the crash than before the crash. We could

    also say that the co-movements of the five equity markets were closer during the market opening

    period (Period II) than during the closed markets period (Period I).

    The Mexican, U.S., and Chilean equity markets have the highest factor loadings in the first

    principal component. This principal component explains 31.9% of the total variation in the index

    returns data matrix. The Argentine and Brazilian equity markets dominate the second principal

    component. This principal component explains 24.4% of the total variation in the index returns

    data matrix. Since the Argentine equity market is negatively correlated with the Brazilian equity

    market in this period, it has a negative factor loading in the second principal component. The two

    principal components together explain 56.3% of the total variation in the index returns data

    matrix.

    July 1991February 1995 (Period III)

    There is only one statistically significant principal component in Period III, as compared with

    two statistically significant principal components in Period II. This implies that the co-

    movements of the five equity markets were even closer in Period III than in Period II. In Period

    III, the opening of the markets is consolidated and large portfolio inflows into the Latin markets

    are observed. The Argentine equity market has the highest factor loading and the U.S. equity

  • 18

    market has the lowest factor loading. The factor loadings of all five equity markets have positive

    signs. The principal component explains 44.4% of the total variation in the index returns data

    matrix.

    The number of statistically significant principal components is three in Period I, two in Period II,

    and only one in Period III. This implies that the co-movements of the five equity markets have

    become considerably closer over time during the February 1984February 1995 period.

    Principal Components Analysis (PCA) is another approach that has been applied in studying

    diversification and shares common points with both correlation and factor analysis. While

    computationally it is a special case of factor analysis, it can be applied to returns from a set or

    portfolio of financial assets as a more sophisticated way of studying their correlation matrix and

    integration. PCA is used to measure the degree of interdependence and covariability between

    several assets. Multiple- or single-equation regression analysis is inappropriate for this purpose

    because the returns on these assets may well be highly co-Uinear. The method of principal

    components constructs from a set of variables, X, a new set of orthogonal variables, P, the

    principal components. Each one of these components absorbs and accounts for the maximum

    possible proportion of the variation in the variables X. If the variations in the returns of a set of

    financial assets or markets are explained by relatively few principal components, then one can

    conclude that they are highly integrated and that opportunities for diversification are limited.

    Correlation analysis, factor analysis, and PCA are concerned with the contemporaneous

    information flows across markets, i.e., the first risk premium.

    These approaches essentially measure the integration of national financial markets and are

    sufficient if market efficiency is strong. In the case when markets are weakly efficient, then these

  • 19

    approaches will not be adequate if a co-integration mechanism is present. Co-integration

    quantifies market inefficiencies as short-term disequilibrium variations in prices and can be

    perceived as a sufficient, though not necessary, condition for segmentation between national

    equity markets.

    The results obtained from applying PCA to the returns on the nine markets are presented in Table

    3 of Diversification benefits in the smaller European stock markets. The conclusions drawn

    from this approach confirm those from correlation analysis. The first principal component P1

    explains 51 to 55 percent of the stock returns covariability with factor loadings that are

    significant for most countries. The increase in the coefficient of determination and the value of

    the eigenvalue in the second period suggests that the markets under study have become more

    integrated. The component P1 can be interpreted as the true stock market return which abstracts

    from risk and uncertainty and represents a compensation for sacrificed liquidity [Nellis, 1982].

    An increase in the factor loading in P1 for some country is an indication of increased

    interdependence. It is clear that such an increase has occurred for Greece, Spain, and Ireland.

    Greece in the pre-October 1987 period cannot be explained by the first dominant component

    since it has an insignificant factor loading. In the second sample, Greece appears more integrated

    and enters the first component with a significant, although small, factor loading. In both periods,

    the returns of the Greek stock market need additional components to be explained. The U.S. and

    Dutch markets retain the highest factor loading in P1 for both periods.

    Correlation and PCA found increased integration between the European markets and the U.S.

    market. The smaller European markets were not found to be more strongly integrated with the

    Japanese market for the period after the October 1987 crash.

  • 20

    In the The financial characteristics of small firms which achieve quotation on the UK unlisted

    securities market. In this study, principal component analysis is applied to the financial ratios

    data of the fifty-six firms. Principal components obtained are then used as input for the

    multivariate analysis of variance (MANOVA) to compare the financial characteristics of firms

    which have achieved USM quotation with those which have not.

    The six principal components can be named in accordance with the factor loadings of the fifteen

    financial ratios in each principal component. The factor loadings show the correlation between

    the principal components and the fifteen financial ratios. Those financial ratios which are highly

    correlated with a given principal component serve as definers of that principal component. For

    example, leverage and liquidity ratios have the highest factor loadings in the first principal

    component. Therefore, the first principal component can represent the indebtedness and liquidity

    of the firms. Since profitability ratios have the highest factor loadings in the second principal

    component, this principal component can represent profitability. Since growth ratios have the

    highest factor loadings in the third principal component, this principal component can represent

    growth rate, etc.

    In Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case

    of the Athens Stock Exchange. The first principal component consists of variables that reflect

    the economys wide influence, since high loadings are observed for 13 out of 19 variables.

    Among these variables, those with the significant positive loadings are: Inflation, Money Supply,

    Wage Cost, Cost of Construction, Exchange Rates and The Capital Account. It is reminded that

    the 1980-86 period was characterized by high inflationary pressures and the economic policy

    framework consisted of an accommodating monetary policy, a loose incomes policy and a

    continuous depreciation of the Greek currency unit, thus pushing costs of production upwards

  • 21

    and introducing uncertainty with respect to companies earnings prospects. As far as fiscal policy

    was concerned, it was loose too, and the increasing deficit of the public sector is reflected by the

    fact that the Budget Deficit variable had also significant loading, although with negative sign

    because it was reported as a series of negative numbers. It is worth noting that the Market Index,

    used to approximate the market portfolio, is not included in the first factor. A possible

    explanation of this result could be the relative unimportance of the Stock Market in the presence

    of serious macroeconomic instabilities. Additionally, variables like Industrial Production and

    Construction Permits represented the stagnant value added of the secondary sector and the

    negative private investment in housing. Likewise, the Lending Rate has a low correlation with

    the component since it is determined by the Central Bank, remaining unchanged for long periods

    of time. Although Exports and Imports of Goods have a significant weighting, the Current

    Account which also includes the invisible transactions, is not significant, mainly due to the

    increase in reverse immigration and the international crisis in shipping.

    Variables which were not correlated with those most heavily loaded in the first component

    constitute the second component such us the Stock Market Index along with the Construction

    Index and Gold Reserves. As it was mentioned earlier in the paper, the Stock Market was rather

    impotent, the demand for construction investment was relatively low and the level of State

    reserves was not particularly volatile. Likewise, the correlation based construction of the third

    component consisting of the Current Account, and the Unemployment Rate does not lack

    economic meaning.

    The analysis for the period 1986-92 and the determination of the orthogonal factors are presented

    in Table III (overleaf). It was found, as in the case of the first period, that the first component

    consists of variables which represent a large part of the economy. In contrast to the previous

  • 22

    period however, the Stock Market Index, the Lending Rate and the Gold Reserves are

    significantly positively correlated to the first component. This phenomenon could be respectively

    explained by the growing importance of the Athens Stock Exchange, the gradual deregulation of

    interest rates and the significant increase of the level of foreign reserves11. The insignificance of

    the budget deficit in the first component, during this period, may reflect the temporary reduction

    of the PSBR in 1986-87 due to the stabilization programme imposed. This variable now loads on

    the second component with the correct sign. Unemployment Rate on the other hand, after its

    significant increase during the 1980-86 period, stabilized around 7.5% for the later period.

    Consequently, this variable too, loads on the second component, which has a strong positive

    correlation with Industrial Production which accelerated after the implementation of the

    stabilization program, therefore exhibiting a negative sign.

    The second principal component in the period 1980-86 consists mainly of the Stock Market

    Index and the Construction Index. This systematic risk is not significantly priced, reflecting the

    unimportance of these variables in this period. In period 1986-92, the second component is

    highly correlated with Industrial Production and the Budget Deficit and has a positive and

    significant sign. This means that investors demand risk premium vis- vis these systematic risk,

    reflecting uncertainty with respect to a further increase in industrial production and threatening

    budget deficits. The coefficient of the estimated risk price for the third principal component is

    significant in each subperiod ( in both subperiods the current account is the leading variable).

    However, the signs of the coefficients are opposite. The positive sign for the first subperiod is

    explained by the seriousness of the current account deficit problems, which was above the 5% of

    GDP for the period, peaking in 1985 at 10% of GDP. The current account was one of the basic

    instabilities that lead to the implementation of the stabilization programme at the end of 1985.

  • 23

    On the contrary, the current account problem was alleviated in the next period (the current

    account deficit as a percentage of GDP decreased to 3%, although growth and investment were

    accelerated significantly) contributing to the optimism about investing in the Stock Market.

    PCA is used in the Globalization and changing patterns in the international transmission of

    shocks in financial markets. We apply principal components analysis to our monthly data on

    yield spreads as well as our indexes of exchange market pressure. The first principal component

    vector provides a measure of the overall extent of co-movement within these data, while an

    analysis of the factor loadings associated with the second and third principal component vectors

    reveals various patterns in dependence within groups. Typically this grouping is easy to identify

    by plotting the factor loadings, however, to take some of the arbitrariness out of identifying

    groups we employ a clustering algorithm to categorize countries into three distinct clusters. This

    works by minimizing the distance between members of a group, while maximizing the

    distance across separate groups.

    As a complement to our principal components analysis, we estimate the probability of a global

    currency crisis. We identify global currency crises as extreme values of an index which captures

    the degree of exchange market pressure that is common to all countries. Specifically this index is

    the first principal component of the exchange market pressure data.

    While principal components analysis sheds light on the patterns in cross-country

    interdependence, it does not account for all of the complex dynamics and inter-relationships that

    may exist between countries. To better understand these relationships, we estimate vector auto

    regressions using data on short-term interest rates. By estimating impulse response functions

    from these VARs, we were able to trace the impact of a shock in one country on another, and

  • 24

    thus shed light on the direction of shocks and the degree to which they impacted on other

    countries.

    Like mentioned before, PCA has often been applied in finance to study movements in stock

    markets. Papers by Leger and Leone (2008) and Meric, Ratner and Meric(2008) do just that, but

    from a slightly different perspective. Leger and Leone look at the changes in the UK stock

    market and macroeconomic factors (news) that could cause them and found that Market Capital

    Gain, Dividend Yield and Consumer Confidence were only ones with significant influence.

    Meric, Ratner and Meric focus on a much conventional topic in finance papers using PCA, they

    look at the possibilities of diversification among major stock markets, but they also include

    market sectors in their analysis They find that, in a bull market, investors can obtain more

    benefit with global diversification than with domestic diversification even if they invest in the

    same sector in different countries as opposed to investing in different sectors within the same

    country. In a bear market, the sectors of different countries tend to be more closely correlated

    and country diversification opportunities are limited.

    An interesting paper by Shih et al.(2007) compares performance of China's state owned banks,

    joint-stock banks and city commercial banks with performance measures developed using PCA.

    Using PCA to develop performance measures is an interesting approach, though in this case it is

    partly due to Chinese Government regulations of not publishing the direct bank performance data.

    They find, that mid-sized joint-stock banks have the best performance in China. They suggest

    this may be due to larger public pressure and less political importance. Also, local banks in costal

    areas generally perform better than in inland, worst banks are in north-east of China.

  • 25

    PCA has also been applied into the measurement of convergence. Becker and Hall(2009) define

    convergence as something that is taking place between a vector of 2 or more series over any

    given period 1 to T if the %R2 of the first principle component calculated over the period 1 to

    Tt is less than the %R2 of the first principal component calculated over the period Tt to T,

    0btbT. Using this definition they find that there is little convergence between inflation rates of

    European Monetary Union member countries and the New Member Countries of EU, except for

    those 3 countries who have been accepted to join the Euro countries.

    As we could see, PCA has many diverse and interesting applications in finance. PCA is an

    analytical tool that can be applied alone or in conjunction with other measures to make sense of

    complicated data and get interesting research findings.

  • 26

    Chapter IV Conclusions

    Based on the articles in this paper we can see that Principal Component Analysis (PCA) is a

    mathematical algorithm that reduces the dimensionality of the data while retaining most of the

    variation in the data set. It accomplishes this reduction by identifying directions, called principal

    components, along which the variation in the data is maximal. By using a few components, each

    sample can be represented by relatively few numbers instead of by values for thousands of

    variables. Samples can then be plotted, making it possible to visually assess similarities and

    differences between samples and determine whether samples can be grouped.

    PCA not only is applied in finance but also in a lot of others sector such as computer science,

    image pattern, finding common features of facial images of human beings, image compression

    and computation biology among few. Many applications beyond dimensional reduction,

    classification and clustering have taken advantage of global representations of expression

    profiles generated by this decomposition. Applications include identifying patterns that correlate

    with experimental artifacts and filtering them out, estimating missing data, associating genes and

    expression patterns with activities of regulators and helping to uncover the dynamic architecture

    of cellular phenotypes. The rapid growth in technologies that generate high-dimensional

    molecular biology data will likely provide many new applications for PCA in the years to come.

    There are also many possible new applications in finance and economics, like for example the

    new framework for measuring convergence. With the development of computation power

    (software and hardware), even more complicated analyses utilizing principal component analysis

    are possible. In finance and in all other fields of science.

  • 27

    List of Articles

    1- Detection of financial distress via multivariate statistical analysis (Ganesalingam, 2001).

    2- Co-movements of the U.S, U.K., and Middle East Stock markets (Meric, Ratner, & Meric, 2007)

    3- Principal components analysis for correlated curves and seasonal commodities: The case of the

    Petroleum markets (Tolmasky & Hindanov, 2002).

    4- Globalization and changing patterns in the international transmission of stock in financial

    markets (Bordo & Murshid, 2006)

    5- Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case of

    the Athens Stock Exchange (Diacogiannis, Tsiritakis, & Manolas, 2001).

    6- The financial characteristics of small firms which achive quotation on the UK unlisted securities

    market (P.Hutchinson, Meric, & Meric, 1988).

    7- Diversification benefits in the smaller European stock markets (Markellos & Siriopoulos, 1997).

    8- Co-movements of U.S and Latin American equity markets before and after the 1987 crash

    (Meric, Leal, Ratner, & Meric, 2001).

    9- Changes in the risk structure of stock returns: Consumer Confidence and the dotcom bubble (Leger & Leone, 2007).

    10- Co-movements of sector index returns in the world's major stock markets in bull and bear markets: Portfolio diversification implications (Meric, Ratner &Meric, 2006).

    11- How far from the Euro Area? Measuring convergence of inflation rates in Eastern Europe (Becker & Hall, 2009).

    12- Comparing the performance of Chinese banks: A principal component approach (Shih, Zhang & Liu, 2006)

    13- International portfolio diversification: A multivariate analysis for a group of Latin American countries (Lessard 1973)

    14- The inter-temporal stability of international stock market relationships: Another view (Philippatos, Christofi & Christofi 1983)

    15- An analysis of the interrelationships among the major world stock exchanges (Makridakis & Wheelwright 1974)

  • 28

    16- A new approach to modeling the dynamics of implied distributions: Theory and evidence from the S&P 500 options (Panigirtzoglou & Skiadopoulos, 2004)

  • 29

    References

    Becker, B., & Hall, S. G. (2009). How far from the Euro Area? Measuring convergence of inflation rates in Eastern Europe. Economic Modelling, In Press, Corrected Proof.

    Bordo, M. D., & Murshid, A. P. (2006). Globalization and changing patterns in the international transmission of stock in financial markets. Journal of International Money and Finance, 25, 655-674.

    Cortazar, G., & Schwartz, E. (1994). The evalation of Commodity-contingent claims. Journal of Derivatives, 1(4), 27-39.

    Diacogiannis, G. P., Tsiritakis, E. D., & Manolas, G. A. (2001). Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case of the Athens Stock Exchange. Managerial Finance, 27(6), 23-41.

    Forzani, L., & Tolmasky, C. F. (2001). On the spectral decomposition of empirical correlation matrices. Journal of Knot theory and its ramifications, 10(8), 1201-1214.

    Ganesalingam, S. (2001). Detection of financial distress via multivariate statistical analysis. Managerial Finance, 27(4), 45-55.

    Jolliffe, I. T. (2002). Principal component analysis (Second ed.): Springer. Leger, L., & Leone, V. (2008). Changes in the risk structure of stock returns: Consumer

    confidence and the dotcom bubble. Review of Financial Economics, 17(3), 228-244. Lessard, D. R. (1973). International portfolio diversification: A multivariate analysis for a group

    of Latin American countries. The Journal of Finance, 28(3), 619-633. Makridakis, S. G. (1974). An analysis of the interrelationships among the major world stock

    exchanges. Journal of Business Finance and Accounting, 1(2), 195. Malava, A. (2006). Principal component analsis on term structure of interest rates. Unpublished

    Independent Research Project in Applied Mathematics Helsinki University of Technology Department of Engineering Physics and Mathematics.

    Markellos, R. N., & Siriopoulos, C. (1997). Diversification benefits in the smaller European stock markets International Advances in Economic Research, 3(2), 142-153.

    Meric, G., Leal, R. P. C., Ratner, M., & Meric, I. (2001). Co-movements of U.S and Latin American equity markets before and after the 1987 crash International Review of Financial Analysis, 10, 219-235.

    Meric, G., Ratner, M., & Meric, I. (2007). Co-movements of the U.S, U.K., and Middle East Stock markets. Middle Eastern Finance and Economics(1), 60-73.

    Meric, I., Ratner, M., & Meric, G. (2008). Co-movements of sector index returns in the world's major stock markets in bull and bear markets: Portfolio diversification implications. International Review of Financial Analysis, 17(1), 156-177.

    P.Hutchinson, Meric, G., & Meric, I. (1988). The financial characteristics of small firms which achive quotation on the UK unlisted securities market. Journal of Business Finance and Accounting, 15(1), 9-19.

    Panigirtzoglou, N., & Skiadopoulos, G. (2004). A new approach to modeling the dynamics of implied distributions: Theory and evidence from the S&P 500 options. Journal of Banking & Finance, 28(7), 1499-1520.

  • 30

    Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6), 559-572.

    Philippatos, G. C., Christofi, A., & Christofi, P. (1983). The inter-temporal stability of international stock market relationships: Another view. Financial Management, 12(4), 63-69.

    Shih, V., Zhang, Q., & Liu, M. (2007). Comparing the performance of Chinese banks: A principal component approach. China Economic Review, 18(1), 15-34.

    Shlens, J. (2009). A Tutorial on Principal Component Analysis.Unpublished manuscript. Smith, L. (2002). A tutorial on Principal Components Analysis.Unpublished manuscript. Tolmasky, C., & Hindanov, D. (2002). Principal components analysis for correlated curves and

    seasonal commodities: The case of the Petroleum markets. The Journal of Futures Markets, 22(11), 1019-1035.