pca

NATIONALCHENGKUNGUNIVERSITY

PrincipalComponentAnalysis

FinalPaperinFinancialPricing

TuanAnh

SanderMgi

6/17/2009

[Typetheabstractofthedocumenthere.Theabstractistypicallyashortsummaryofthecontentsofthedocument.Typetheabstractofthedocumenthere.Theabstractistypicallyashortsummaryofthecontentsofthedocument.]

2

Table of Contents

Table of Contents............................................................................................................................ 2

Chapter I Introduction..................................................................................................................... 3

Chapter II Literature review............................................................................................................ 4

2.1 What is PCA ......................................................................................................................... 4

2.1.1 Definition of PCA.......................................................................................................... 4

2.1.2 History of PCA .............................................................................................................. 4

2.1.3 Basic assumptions.......................................................................................................... 5

2.1.4 Important concepts......................................................................................................... 6

2.1.5 Calculating principal components.................................................................................. 6

2.1.6 Deriving principal components...................................................................................... 6

2.2 Advantages and disadvantages of PCA ................................................................................ 9

2.2.1 Importance of PCA ........................................................................................................ 9

2.2.2 Benefits of PCA ........................................................................................................... 10

2.2.3 Limitations of PCA................................................................................................. 11

2.3 Practical implications - Software ................................................................................... 12

Chapter III Applications ............................................................................................................... 14

Chapter IV Conclusions................................................................................................................ 26

List of Articles .............................................................................................................................. 27

References..................................................................................................................................... 29

3

Chapter I Introduction

When starting a research students as well as researchers often collect a lot of data or sometimes

come across large datasets that are available. But when having lots of data, especially when it is

secondary data, it is often very easy to get confused. It is hard to find the variables that are really

important for the research when there are so many variables to consider. This is where principal

components analysis (PCA) can help.

Principal Components Analysis (PCA) was invented by Karl Pearson in 1901 and is now used in

many fields of science. PCA is mostly used as a tool in exploratory data analysis because what it

essentially does it to find the most important variables (a combination of them) that explain most

of the variance in the data. So, when there is lots of data to be analyzed, PCA can make the task a

lot easier. PCA also helps to construct predictive models.

In this paper we are going to focus on applications of PCA in finance research. Earlier

applications of PCA in finance date back to early 1970s, while there are many articles from 2009

that used PCA. PCA is also often used in combination with other methods.

In chapter II we are going to first explain what PCA is and how it works. We are also going to

discuss the advantages and limitations as well as the importance of PCA. We are doing this by

reviewing some relevant literature. In chapter III we are continuing our literature review and

focus on the applications of PCA. Chapter IV concludes our overview of PCA.

4

So, when having this big pile of data and having decided to use PCA to find the most important

variables, what do we need to do now? We need to understand PCA and learn how to apply it.

This is what the next section of this paper focuses on.

Chapter II Literature review

2.1 What is PCA

2.1.1 Definition of PCA

PCA is known a Principle Component Analysis this is a statistical analytical tool that is used to

explore, sort and group data. What PCA does is take a large number of correlated (interrelated)

variables and transform this data into a smaller number of uncorrelated variables (principal

components) while retaining maximal amount of variation, thus making it easier to operate the

data and make predictions. Or as Smith (2002) puts it PCA is a way of identifying patterns in

data, and expressing the data in such a way as to highlight their similarities and differences.

Since patterns in data can be hard to find in data of high dimension, where the luxury of

graphical representation is not available, PCA is a powerful tool for analyzing data.

2.1.2 History of PCA

According to Jolliffe (2002) it is generally accepted that PCA was first described by Karl

Pearson in 1901. In his article On lines and planes of closest fit to systems of points in space,

Pearson (1901) discusses the graphical representation of data and lines that best represent the

data. He concludes that The best-fitting straight line to a system of points coincides in direction

5

with the maximum axis of the correlation ellipsoid. He also states that the analysis used in his

paper can be applied to multiple variables.

However, PCA was not widely used until the development of computers. It is not really feasible

to do PCA by hand when number of variables is greater than four, but it is exactly for larger

amount of variables that PCA is really useful, so the full potential of PCA could not be used until

after the spreading of computers (Jolliffe, 2002).

According to Jolliffe (2002) significant contributions to the development of PCA were made by

Hotelling (1933) and Girshick (1936; 1939) before the expansion in the interest towards PCA. In

1960s. as the interest in PCA rose, important contributors were Anderson (1963) with a

theoretical discussion, Rao (1964) with numerous new ideas concerning uses, interpretations and

extensions of PCA, Gower (1966) with discussion about links between PCA and other statistical

techniques and Jeffers (1967) with a practical application in two case studies.

2.1.3 Basic assumptions

According to Shlens (2009) there are three basic assumptions behind PCA that need to be

considered when calculating and interpreting principal components:

1) Linearity - Linearity frames the problem as a change of basis. Several areas of research

have explored how extending these notions to nonlinear regimes.

2) Large variances have important structure - This assumption also encompasses the belief

that the data has a high SNR. Hence, principal components with larger associated

variances represent interesting structure, while those with lower variances represent noise.

Note that this is a strong, and sometimes, incorrect assumption.

3) The principal components are orthogonal - This assumption provides an intuitive

simplification that makes PCA soluble with linear algebra decomposition techniques.

6

2.1.4 Important concepts

y Principal component - a linear combination of the original variables (1st principal

component explains most of the variation n the data, 2nd PC explains most of the rest of

the variance and so on)

y Eigenvectors - the coefficients of the original variables used to construct factors

y Eigenvalue - a corresponding scalar value for each eigenvector of a linear transformation

2.1.5 Calculating principal components

Jolliffe (2002) states that principal components (PCs) can be found using purely mathematical

arguments they are given by an orthogonal linear transformation of a set of variables

optimizing a certain algebraic criterion.

Shlens 2009) gives an overview how to perform principal components analysis:

1. Organize data as an mn matrix, where m is the number of measurement types and n is the

number of samples

2. Subtract off the mean for each measurement type

3. Calculate covariance matrix

4. Calculate the eigenvectors and eigenvalues of the covariance matrix

2.1.6 Deriving principal components

The following is a standard derivation of principal components presented by Jolliffe (2002).

To derive the form of the PCs, consider first 1x; the vector a1 maximizes

. It is clear that, as it stands, the maximum will not be achieved for

7

finite 1 so a normalization constraint must be imposed. The constraint used in the derivation is

11 = 1, that is, the sum of squares of elements of 1 equals 1. Other constraints may be more

useful in other circumstances, and can easily be substituted later on. However, the use of

constraints other than 11 = constant in the derivation leads to a more difficult optimization

problem, and it will produce a set of derived variables different from the principal components.

To maximize subject to 11 = 1, the standard approach is to

use the technique of Lagrange multipliers.

Maximize where is a Lagrange multiplier. Differentiation with respect to 1 gives or

Where Ip is the (p x p) identity matrix. Thus, is an eigenvalue of and 1 is the

corresponding eigenvector. To decide which of the p

eigenvectors gives 1x with maximum variance, note that the quantity to be maximized is

so must be as large as possible. Thus, 1 is the eigenvector corresponding to the largest

eigenvalue of , and , the largest eigenvalue.

In general, the kth PC of x is akx and , where k is the kth largest eigenvalue of

, and k is the corresponding eigenvector.

Shlens (2009) derives an algebraic solution to PCA based on an important property of

eigenvector decomposition. Once again, the data set is X, an mn matrix, where m is the

8

number of measurement types and n is the number of samples. The goal is summarized as

follows:

Find some orthonormal matrix P in Y = PX such that is a diagonal matrix. The rows

of P are the principal components of X.

He begins by rewriting CY in terms of the unknown variable.

Note that they have identified the covariance matrix of X in the last line.

The plan is to recognize that any symmetric matrix A is diagonalized by an orthogonal matrix of

its eigenvectors. For a symmetric matrix A => A=EDET , where D is a diagonal matrix and E is a

matrix of eigenvectors of A arranged as columns.

Now comes the trick. They select the matrix P to be a matrix where each row pi is an eigenvector

of . By this selection, . With this relation and A (P1 = PT) we can finish

evaluating CY.

9

It is evident that the choice of P diagonalizes CY. This was the goal for PCA. We can summarize

the results of PCA in the matrices P and CY.

The principal components of X are the eigenvectors of

The ith diagonal value of CY is the variance of X along pi.

In practice computing PCA of a data set X entails (1) subtracting off the mean of each

measurement type and (2) computing the eigenvectors of CX.

2.2 Advantages and disadvantages of PCA

2.2.1 Importance of PCA

Principal component analysis (PCA) is a standard tool in modern data analysis - in diverse fields

from neuroscience to computer graphics - because it is a simple, non-parametric method for

extracting relevant information from confusing data sets. With minimal effort PCA provides a

roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes

hidden, simplified structures that often underlie it. (Shlens, 2009)

Importance of PCA is manifested by its use in so many different fields of science and life. PCA

is very much used in neuro-science, for example. Another fields of use are pattern recognition

and image compression, therefore PCA is suited for use in facial recognition software for

10

example, as well as for recognition and storing of other biometric data. Many IT related fields

also use PCA, even artificial intelligence research. According to Jolliffe (2002) PCA is also used

in research of agriculture, biology, chemistry, climatology, demography, ecology, food research

(?), genetics, geology, meteorology, oceanography, psychology, quality control, etc. But in this

paper we are going to focus more on uses in finance and economy.

PCA has been used in economics and finance to study changes in stock markets, commodity

markets, economic growth, exchange rates, etc. Earlier studies were done in economics, but

stock markets were also under research already in 1960s. Lessard (1973) claims that principal

component or factor analysis have been used in several recent empirical studies (Farrar [1962],

King [1967], and Feeney and Hester [1967]) concerned with the existence of general movements

in the returns from common stocks. PCA has mostly been used to compare different stock

markets in search for diversification opportunities, especially in earlier studies like the ones by

Makridakis (1974) and by Phillipatos et al.(1983).

2.2.2 Benefits of PCA

PCA is a special case of Factor Analysis that is highly useful in the analysis of many time series

and the search for patterns of movement common to several series (true factor analysis makes

different assumptions about the underlying structure and solves eigenvectors of a slightly

different matrix). This approach is superior to many of the bivariate statistical techniques used

earlier, in that it explores the interrelationships among a set of variables caused by common

"factors," mostly economic in nature. (Philippatos, Christofi, & Christofi, 1983)

PCA is a way of identifying patterns in data, and expressing the data in such a way as to

highlight their similarities and differences. A primary benefit of PCA arises from quantifying the

11

importance of each dimension for describing the variability of a data set(Shlens, 2009). PCA can

also be used to compress the data, by reducing the number of dimensions, without much loss of

information.

When using principal component analysis to analyze a data set, it is usually possible to explain a

large percentage of the total variance with only a few components. Principal components are

selected so that each successive one explains a maximum of the remaining variance, the first

component is selected to explain the maximum proportion of the total variance, the second to

explain the maximum of the remaining variance, etc. Therefore, the principal component

solution is a particularly appropriate test for the existence of a strong market factor. (Lessard,

1973).

PCA is completely nonparametric: any data set can be plugged in and an answer comes out,

requiring no parameters to tweak and no regard for how the data was recorded. From one

perspective, the fact that PCA is non-parametric (or plug-and-play) can be considered a positive

feature because the answer is unique and independent of the user.

2.2.3 Limitations of PCA

Limitations in PCA occur mainly due to the previously mentioned main assumptions and the data

at hand. PCA is not a statistical method from the viewpoint that there is no probability

distribution specified for the observations. Therefore it is important to keep in mind that PCA

best serves to represent data in simpler, reduced form.

It is often difficult, if not impossible, to discover the true economic interpretation of PCs since

the new variables are linear combinations of the original variables. In addition, for PCA to work

12

exactly, one should use standardized data so that the mean is zero and the unbiased estimate of

variance is unity:

Where zi =ith standardized variable.

This is because it is often the case that the scales of the original variables are not comparable and that

(those) variable (variables) with high absolute variance will dominate the first principal component.

There is one major drawback to standardization, however. Standardizing means that PCA results

will come out with respect to standardized variables. This makes the interpretation and further

applications of PCA results even more difficult. (Malava, 2006)

The mission when using PCA is often to get rid of correlation and interdependence of variables.

PCA succeeds in getting rid of second order dependences, but it has trouble with higher-order

dependencies. This problem might be solved by using kernel PCA or independent component

analysis. The fact that PCA is agnostic to the source of the data is also a weakness.(Shlens, 2009)

2.3 Practical implications - Software

When searching for principal components analysis software on the internet, there are numerour

vendors offering their services ans well as freeware packages available for users who prefer not

to pay. With the help of Wikipedia and Google searches we come out with this list of software

for PCA.

"ViSta: The Visual Statistics System" is free software that provides principal components

analysis, simple and multiple correspondence analysis. "Spectramap" is software to create a

biplot using principal components analysis, correspondence analysis or spectral map analysis.

Other software packages with PCA include Computer Vision Library, Multivariate Data

Analysis Software, MVSP, The Unscrambler, PCA/X and many others.

13

It is also possible to find PCAs using MS Excel, but this requires purchacing of add-in software

called XLSTAT.

In MATLAB, the functions "princomp" and "wmspca" give the principal components, while the

function "pcares" gives the residuals and reconstructed matrix for a low-rank PCA

approximation. While in Octave, the free software equivalent to MATLAB, the function

princomp gives the principal component.

In the open source statistical package R, the functions "princomp" and "prcomp" can be used for

principal component analysis; prcomp uses singular value decomposition which generally gives

better numerical accuracy, while "spm" is a generic package developed in R for multivariate

projection methods that allows principal components analysis.

In XLMiner, the Principles Component tab can be used for principal component analysis. In IDL,

the principal components can be calculated using the function pcomp. Weka computes principal

components (javadoc).

14

Chapter III Applications

Principal Components Analysis (PCA) can be applied to both frequency and time domain, real

and complex data, Spectral analysis quantify MRS data. It is also be used to find image pattern,

find common features of facial image of human being and image impression. But in this final

report we will concentrate more on the application to finance.

In the article principle components analysis for correlated curves and seasonal commodities:

The case of the petroleum market. To find the volatility functions they analyzed the principal

components of the correlation matrix of the historical returns. This methodology will ultimately

allow us to capture the variance of the multiple-curve market with the minimum number of

factors (which will lead to a less computationally intensive model).

It is reasonable to expect that the principal components of any single market behave similarly to

what was shown by Cortazar and Schwartz (1994). That is, one would look for a parallel shift

first, then for changes in slope and curvature and expect these to explain a large proportion of the

futures volatilities. This is because the futures contracts are positively correlated, and the

correlation declines with the difference in maturity. Thus, a joint move will tend to be more

important than a separating move of the same frequency, and a low-frequency move will tend to

be more important than a higher frequency one. The mathematics is worked out in Forzani and

Tolmasky (2001).

The main question we try to answer in this section is how the results of the PCA differ when we

build a model for a commodity that experiences seasonality. If we analyze the explanatory power

of each of the principal components in the case of crude we find that it is fairly stable across

15

trading periods. Due to seasonality effects, we can guess that this will not be the case in the case

of heating oil.

First, note that, as one would expect, the factor pattern for the heating oil is remarkably similar to

that of the crude oil. Similarly, 95.80% of the total variance is due to changes in the level,

99.02% is explained by the level and slope, and 99.63% by the first three factors. This is to

be expected given that the heating oils correlation matrix is stereotypical of many commodity

markets. The factor pattern is remarkably similar to that of the crude oil.

Crude oil: relative importance of the first four factors by season. Are these seasonal differences

statistically significant? Although some results on hypothesis testing in PCA models are

available in the literature, we are not aware of any work on the sampling distribution of the ratio

of the first eigenvalue of a correlation matrix to the sum of the n largest. Overall, the complexity

of the PCA results has increased tremendously in making a small step from a one-commodity to

a two commodity setup.

Another application of our results is pricing correlation-dependent options on petroleum products.

The PCA is helpful if, first, the options payoff depends on correlations between many different

curve points and/or curves and, second, the option will be priced by Monte Carlo simulation.

Under these circumstances, the PCA provides a valuable dimensionality reduction for the Monte

Carlo.

PCA is also widely used to study the co-movement patterns of national equity markets. We apply

PCA to each subperiod separately to study the changes in the co-movement patterns of the U.S

and the four Latin American equity markets. The correlation coefficient measures the extent to

which two statistical series move together. PCA, a multivariate statistical technique, is a useful

16

tool to analyze patterns of co-movement common to several series. In the paper Co-movements

of U.S. and Latin American equity markets before and after the 1987 crash PCA is applied to

each of the three subperiods to study the changes in the co-movement patterns of the five equity

markets between the subperiods. Using Kaisers significance rule, principal components with

eigenvalues greater than unity are retained for analysis. Kaisers varimax rotation is used for an

easier interpretation of the principal components. The highest factor loadings in each principal

component are marked with an asterisk.

February 1984September 1987 (Period I)

For Period I, three principal components with eigenvalues greater than unity are retained for

analysis. The Mexican and Brazilian equity markets have the highest factor loadings in the first

principal component. This principal component explains 28.6% of the total variation in the index

returns data matrix. Since the Brazilian equity market is negatively correlated with the Mexican

equity market in this period, it has a negative factor loading in the first principal component.

The Chilean and U.S. equity markets have the highest factor loadings in the second principal

component. This principal component explains 24.8% of the total variation in the Table 5 index

returns data matrix. The first two principal components together explain 52.9% of the total

variation in the index returns data matrix.

The Argentine equity market dominates the third principal component. This principal component

explains 20.1% of the total variation in the index returns data matrix. The U.S. equity market

also has a high factor loading in this principal component. However, since the U.S. equity market

is negatively correlated with the Argentine equity market in this period, it has a negative factor

17

loading in the third principal component. All three principal components together explain 73.0%

of the total variation in the index returns data matrix.

November 1987June 1991 (Period II)

There are only two statistically significant principal components in Period II, as compared with

three statistically significant principal components in Period I. This implies that the co-

movements of the five equity markets were closer after the crash than before the crash. We could

also say that the co-movements of the five equity markets were closer during the market opening

period (Period II) than during the closed markets period (Period I).

The Mexican, U.S., and Chilean equity markets have the highest factor loadings in the first

principal component. This principal component explains 31.9% of the total variation in the index

returns data matrix. The Argentine and Brazilian equity markets dominate the second principal

component. This principal component explains 24.4% of the total variation in the index returns

data matrix. Since the Argentine equity market is negatively correlated with the Brazilian equity

market in this period, it has a negative factor loading in the second principal component. The two

principal components together explain 56.3% of the total variation in the index returns data

matrix.

July 1991February 1995 (Period III)

There is only one statistically significant principal component in Period III, as compared with

two statistically significant principal components in Period II. This implies that the co-

movements of the five equity markets were even closer in Period III than in Period II. In Period

III, the opening of the markets is consolidated and large portfolio inflows into the Latin markets

are observed. The Argentine equity market has the highest factor loading and the U.S. equity

18

market has the lowest factor loading. The factor loadings of all five equity markets have positive

signs. The principal component explains 44.4% of the total variation in the index returns data

matrix.

The number of statistically significant principal components is three in Period I, two in Period II,

and only one in Period III. This implies that the co-movements of the five equity markets have

become considerably closer over time during the February 1984February 1995 period.

Principal Components Analysis (PCA) is another approach that has been applied in studying

diversification and shares common points with both correlation and factor analysis. While

computationally it is a special case of factor analysis, it can be applied to returns from a set or

portfolio of financial assets as a more sophisticated way of studying their correlation matrix and

integration. PCA is used to measure the degree of interdependence and covariability between

several assets. Multiple- or single-equation regression analysis is inappropriate for this purpose

because the returns on these assets may well be highly co-Uinear. The method of principal

components constructs from a set of variables, X, a new set of orthogonal variables, P, the

principal components. Each one of these components absorbs and accounts for the maximum

possible proportion of the variation in the variables X. If the variations in the returns of a set of

financial assets or markets are explained by relatively few principal components, then one can

conclude that they are highly integrated and that opportunities for diversification are limited.

Correlation analysis, factor analysis, and PCA are concerned with the contemporaneous

information flows across markets, i.e., the first risk premium.

These approaches essentially measure the integration of national financial markets and are

sufficient if market efficiency is strong. In the case when markets are weakly efficient, then these

19

approaches will not be adequate if a co-integration mechanism is present. Co-integration

quantifies market inefficiencies as short-term disequilibrium variations in prices and can be

perceived as a sufficient, though not necessary, condition for segmentation between national

equity markets.

The results obtained from applying PCA to the returns on the nine markets are presented in Table

3 of Diversification benefits in the smaller European stock markets. The conclusions drawn

from this approach confirm those from correlation analysis. The first principal component P1

explains 51 to 55 percent of the stock returns covariability with factor loadings that are

significant for most countries. The increase in the coefficient of determination and the value of

the eigenvalue in the second period suggests that the markets under study have become more

integrated. The component P1 can be interpreted as the true stock market return which abstracts

from risk and uncertainty and represents a compensation for sacrificed liquidity [Nellis, 1982].

An increase in the factor loading in P1 for some country is an indication of increased

interdependence. It is clear that such an increase has occurred for Greece, Spain, and Ireland.

Greece in the pre-October 1987 period cannot be explained by the first dominant component

since it has an insignificant factor loading. In the second sample, Greece appears more integrated

and enters the first component with a significant, although small, factor loading. In both periods,

the returns of the Greek stock market need additional components to be explained. The U.S. and

Dutch markets retain the highest factor loading in P1 for both periods.

Correlation and PCA found increased integration between the European markets and the U.S.

market. The smaller European markets were not found to be more strongly integrated with the

Japanese market for the period after the October 1987 crash.

20

In the The financial characteristics of small firms which achieve quotation on the UK unlisted

securities market. In this study, principal component analysis is applied to the financial ratios

data of the fifty-six firms. Principal components obtained are then used as input for the

multivariate analysis of variance (MANOVA) to compare the financial characteristics of firms

which have achieved USM quotation with those which have not.

The six principal components can be named in accordance with the factor loadings of the fifteen

financial ratios in each principal component. The factor loadings show the correlation between

the principal components and the fifteen financial ratios. Those financial ratios which are highly

correlated with a given principal component serve as definers of that principal component. For

example, leverage and liquidity ratios have the highest factor loadings in the first principal

component. Therefore, the first principal component can represent the indebtedness and liquidity

of the firms. Since profitability ratios have the highest factor loadings in the second principal

component, this principal component can represent profitability. Since growth ratios have the

highest factor loadings in the third principal component, this principal component can represent

growth rate, etc.

In Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case

of the Athens Stock Exchange. The first principal component consists of variables that reflect

the economys wide influence, since high loadings are observed for 13 out of 19 variables.

Among these variables, those with the significant positive loadings are: Inflation, Money Supply,

Wage Cost, Cost of Construction, Exchange Rates and The Capital Account. It is reminded that

the 1980-86 period was characterized by high inflationary pressures and the economic policy

framework consisted of an accommodating monetary policy, a loose incomes policy and a

continuous depreciation of the Greek currency unit, thus pushing costs of production upwards

21

and introducing uncertainty with respect to companies earnings prospects. As far as fiscal policy

was concerned, it was loose too, and the increasing deficit of the public sector is reflected by the

fact that the Budget Deficit variable had also significant loading, although with negative sign

because it was reported as a series of negative numbers. It is worth noting that the Market Index,

used to approximate the market portfolio, is not included in the first factor. A possible

explanation of this result could be the relative unimportance of the Stock Market in the presence

of serious macroeconomic instabilities. Additionally, variables like Industrial Production and

Construction Permits represented the stagnant value added of the secondary sector and the

negative private investment in housing. Likewise, the Lending Rate has a low correlation with

the component since it is determined by the Central Bank, remaining unchanged for long periods

of time. Although Exports and Imports of Goods have a significant weighting, the Current

Account which also includes the invisible transactions, is not significant, mainly due to the

increase in reverse immigration and the international crisis in shipping.

Variables which were not correlated with those most heavily loaded in the first component

constitute the second component such us the Stock Market Index along with the Construction

Index and Gold Reserves. As it was mentioned earlier in the paper, the Stock Market was rather

impotent, the demand for construction investment was relatively low and the level of State

reserves was not particularly volatile. Likewise, the correlation based construction of the third

component consisting of the Current Account, and the Unemployment Rate does not lack

economic meaning.

The analysis for the period 1986-92 and the determination of the orthogonal factors are presented

in Table III (overleaf). It was found, as in the case of the first period, that the first component

consists of variables which represent a large part of the economy. In contrast to the previous

22

period however, the Stock Market Index, the Lending Rate and the Gold Reserves are

significantly positively correlated to the first component. This phenomenon could be respectively

explained by the growing importance of the Athens Stock Exchange, the gradual deregulation of

interest rates and the significant increase of the level of foreign reserves11. The insignificance of

the budget deficit in the first component, during this period, may reflect the temporary reduction

of the PSBR in 1986-87 due to the stabilization programme imposed. This variable now loads on

the second component with the correct sign. Unemployment Rate on the other hand, after its

significant increase during the 1980-86 period, stabilized around 7.5% for the later period.

Consequently, this variable too, loads on the second component, which has a strong positive

correlation with Industrial Production which accelerated after the implementation of the

stabilization program, therefore exhibiting a negative sign.

The second principal component in the period 1980-86 consists mainly of the Stock Market

Index and the Construction Index. This systematic risk is not significantly priced, reflecting the

unimportance of these variables in this period. In period 1986-92, the second component is

highly correlated with Industrial Production and the Budget Deficit and has a positive and

significant sign. This means that investors demand risk premium vis- vis these systematic risk,

reflecting uncertainty with respect to a further increase in industrial production and threatening

budget deficits. The coefficient of the estimated risk price for the third principal component is

significant in each subperiod ( in both subperiods the current account is the leading variable).

However, the signs of the coefficients are opposite. The positive sign for the first subperiod is

explained by the seriousness of the current account deficit problems, which was above the 5% of

GDP for the period, peaking in 1985 at 10% of GDP. The current account was one of the basic

instabilities that lead to the implementation of the stabilization programme at the end of 1985.

23

On the contrary, the current account problem was alleviated in the next period (the current

account deficit as a percentage of GDP decreased to 3%, although growth and investment were

accelerated significantly) contributing to the optimism about investing in the Stock Market.

PCA is used in the Globalization and changing patterns in the international transmission of

shocks in financial markets. We apply principal components analysis to our monthly data on

yield spreads as well as our indexes of exchange market pressure. The first principal component

vector provides a measure of the overall extent of co-movement within these data, while an

analysis of the factor loadings associated with the second and third principal component vectors

reveals various patterns in dependence within groups. Typically this grouping is easy to identify

by plotting the factor loadings, however, to take some of the arbitrariness out of identifying

groups we employ a clustering algorithm to categorize countries into three distinct clusters. This

works by minimizing the distance between members of a group, while maximizing the

distance across separate groups.

As a complement to our principal components analysis, we estimate the probability of a global

currency crisis. We identify global currency crises as extreme values of an index which captures

the degree of exchange market pressure that is common to all countries. Specifically this index is

the first principal component of the exchange market pressure data.

While principal components analysis sheds light on the patterns in cross-country

interdependence, it does not account for all of the complex dynamics and inter-relationships that

may exist between countries. To better understand these relationships, we estimate vector auto

regressions using data on short-term interest rates. By estimating impulse response functions

from these VARs, we were able to trace the impact of a shock in one country on another, and

24

thus shed light on the direction of shocks and the degree to which they impacted on other

countries.

Like mentioned before, PCA has often been applied in finance to study movements in stock

markets. Papers by Leger and Leone (2008) and Meric, Ratner and Meric(2008) do just that, but

from a slightly different perspective. Leger and Leone look at the changes in the UK stock

market and macroeconomic factors (news) that could cause them and found that Market Capital

Gain, Dividend Yield and Consumer Confidence were only ones with significant influence.

Meric, Ratner and Meric focus on a much conventional topic in finance papers using PCA, they

look at the possibilities of diversification among major stock markets, but they also include

market sectors in their analysis They find that, in a bull market, investors can obtain more

benefit with global diversification than with domestic diversification even if they invest in the

same sector in different countries as opposed to investing in different sectors within the same

country. In a bear market, the sectors of different countries tend to be more closely correlated

and country diversification opportunities are limited.

An interesting paper by Shih et al.(2007) compares performance of China's state owned banks,

joint-stock banks and city commercial banks with performance measures developed using PCA.

Using PCA to develop performance measures is an interesting approach, though in this case it is

partly due to Chinese Government regulations of not publishing the direct bank performance data.

They find, that mid-sized joint-stock banks have the best performance in China. They suggest

this may be due to larger public pressure and less political importance. Also, local banks in costal

areas generally perform better than in inland, worst banks are in north-east of China.

25

PCA has also been applied into the measurement of convergence. Becker and Hall(2009) define

convergence as something that is taking place between a vector of 2 or more series over any

given period 1 to T if the %R2 of the first principle component calculated over the period 1 to

Tt is less than the %R2 of the first principal component calculated over the period Tt to T,

0btbT. Using this definition they find that there is little convergence between inflation rates of

European Monetary Union member countries and the New Member Countries of EU, except for

those 3 countries who have been accepted to join the Euro countries.

As we could see, PCA has many diverse and interesting applications in finance. PCA is an

analytical tool that can be applied alone or in conjunction with other measures to make sense of

complicated data and get interesting research findings.

26

Chapter IV Conclusions

Based on the articles in this paper we can see that Principal Component Analysis (PCA) is a

mathematical algorithm that reduces the dimensionality of the data while retaining most of the

variation in the data set. It accomplishes this reduction by identifying directions, called principal

components, along which the variation in the data is maximal. By using a few components, each

sample can be represented by relatively few numbers instead of by values for thousands of

variables. Samples can then be plotted, making it possible to visually assess similarities and

differences between samples and determine whether samples can be grouped.

PCA not only is applied in finance but also in a lot of others sector such as computer science,

image pattern, finding common features of facial images of human beings, image compression

and computation biology among few. Many applications beyond dimensional reduction,

classification and clustering have taken advantage of global representations of expression

profiles generated by this decomposition. Applications include identifying patterns that correlate

with experimental artifacts and filtering them out, estimating missing data, associating genes and

expression patterns with activities of regulators and helping to uncover the dynamic architecture

of cellular phenotypes. The rapid growth in technologies that generate high-dimensional

molecular biology data will likely provide many new applications for PCA in the years to come.

There are also many possible new applications in finance and economics, like for example the

new framework for measuring convergence. With the development of computation power

(software and hardware), even more complicated analyses utilizing principal component analysis

are possible. In finance and in all other fields of science.

27

List of Articles

1- Detection of financial distress via multivariate statistical analysis (Ganesalingam, 2001).

2- Co-movements of the U.S, U.K., and Middle East Stock markets (Meric, Ratner, & Meric, 2007)

3- Principal components analysis for correlated curves and seasonal commodities: The case of the

Petroleum markets (Tolmasky & Hindanov, 2002).

4- Globalization and changing patterns in the international transmission of stock in financial

markets (Bordo & Murshid, 2006)

5- Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case of

the Athens Stock Exchange (Diacogiannis, Tsiritakis, & Manolas, 2001).

6- The financial characteristics of small firms which achive quotation on the UK unlisted securities

market (P.Hutchinson, Meric, & Meric, 1988).

7- Diversification benefits in the smaller European stock markets (Markellos & Siriopoulos, 1997).

8- Co-movements of U.S and Latin American equity markets before and after the 1987 crash

(Meric, Leal, Ratner, & Meric, 2001).

9- Changes in the risk structure of stock returns: Consumer Confidence and the dotcom bubble (Leger & Leone, 2007).

10- Co-movements of sector index returns in the world's major stock markets in bull and bear markets: Portfolio diversification implications (Meric, Ratner &Meric, 2006).

11- How far from the Euro Area? Measuring convergence of inflation rates in Eastern Europe (Becker & Hall, 2009).

12- Comparing the performance of Chinese banks: A principal component approach (Shih, Zhang & Liu, 2006)

13- International portfolio diversification: A multivariate analysis for a group of Latin American countries (Lessard 1973)

14- The inter-temporal stability of international stock market relationships: Another view (Philippatos, Christofi & Christofi 1983)

15- An analysis of the interrelationships among the major world stock exchanges (Makridakis & Wheelwright 1974)

28

16- A new approach to modeling the dynamics of implied distributions: Theory and evidence from the S&P 500 options (Panigirtzoglou & Skiadopoulos, 2004)

29

References

Becker, B., & Hall, S. G. (2009). How far from the Euro Area? Measuring convergence of inflation rates in Eastern Europe. Economic Modelling, In Press, Corrected Proof.

Bordo, M. D., & Murshid, A. P. (2006). Globalization and changing patterns in the international transmission of stock in financial markets. Journal of International Money and Finance, 25, 655-674.

Cortazar, G., & Schwartz, E. (1994). The evalation of Commodity-contingent claims. Journal of Derivatives, 1(4), 27-39.

Diacogiannis, G. P., Tsiritakis, E. D., & Manolas, G. A. (2001). Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case of the Athens Stock Exchange. Managerial Finance, 27(6), 23-41.

Forzani, L., & Tolmasky, C. F. (2001). On the spectral decomposition of empirical correlation matrices. Journal of Knot theory and its ramifications, 10(8), 1201-1214.

Ganesalingam, S. (2001). Detection of financial distress via multivariate statistical analysis. Managerial Finance, 27(4), 45-55.

Jolliffe, I. T. (2002). Principal component analysis (Second ed.): Springer. Leger, L., & Leone, V. (2008). Changes in the risk structure of stock returns: Consumer

confidence and the dotcom bubble. Review of Financial Economics, 17(3), 228-244. Lessard, D. R. (1973). International portfolio diversification: A multivariate analysis for a group

of Latin American countries. The Journal of Finance, 28(3), 619-633. Makridakis, S. G. (1974). An analysis of the interrelationships among the major world stock

exchanges. Journal of Business Finance and Accounting, 1(2), 195. Malava, A. (2006). Principal component analsis on term structure of interest rates. Unpublished

Independent Research Project in Applied Mathematics Helsinki University of Technology Department of Engineering Physics and Mathematics.

Markellos, R. N., & Siriopoulos, C. (1997). Diversification benefits in the smaller European stock markets International Advances in Economic Research, 3(2), 142-153.

Meric, G., Leal, R. P. C., Ratner, M., & Meric, I. (2001). Co-movements of U.S and Latin American equity markets before and after the 1987 crash International Review of Financial Analysis, 10, 219-235.

Meric, G., Ratner, M., & Meric, I. (2007). Co-movements of the U.S, U.K., and Middle East Stock markets. Middle Eastern Finance and Economics(1), 60-73.

Meric, I., Ratner, M., & Meric, G. (2008). Co-movements of sector index returns in the world's major stock markets in bull and bear markets: Portfolio diversification implications. International Review of Financial Analysis, 17(1), 156-177.

P.Hutchinson, Meric, G., & Meric, I. (1988). The financial characteristics of small firms which achive quotation on the UK unlisted securities market. Journal of Business Finance and Accounting, 15(1), 9-19.

Panigirtzoglou, N., & Skiadopoulos, G. (2004). A new approach to modeling the dynamics of implied distributions: Theory and evidence from the S&P 500 options. Journal of Banking & Finance, 28(7), 1499-1520.

30

Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6), 559-572.

Philippatos, G. C., Christofi, A., & Christofi, P. (1983). The inter-temporal stability of international stock market relationships: Another view. Financial Management, 12(4), 63-69.

Shih, V., Zhang, Q., & Liu, M. (2007). Comparing the performance of Chinese banks: A principal component approach. China Economic Review, 18(1), 15-34.

Shlens, J. (2009). A Tutorial on Principal Component Analysis.Unpublished manuscript. Smith, L. (2002). A tutorial on Principal Components Analysis.Unpublished manuscript. Tolmasky, C., & Hindanov, D. (2002). Principal components analysis for correlated curves and

seasonal commodities: The case of the Petroleum markets. The Journal of Futures Markets, 22(11), 1019-1035.

pca

Documents