portfolio selection with artificial neural networks

1

Portfolio Selection with Artificial Neural Networks

Andrew J AshwoodSupervisor: Dr Anup Basu

2

Outline• Introduction• Research Motivation• Neural Network Primer• Literature Review• Research Questions• Network specification• Results • Conclusions• Further work and limitations

3

Research Motivation• Trading in equities is big business in

Australia– Value of All Ordinaries c. $1.2 trillion– Stocks turned over in 2012 c. 244 billion– Funds invested in superannuation c.

$1.4 trillion

4

Research Motivation• Outperforming the broader market is

difficult – From 1969 to 2010, an index investor would

achieve returns 80 per cent greater than the average investor in a managed fund.’ (Malkiel, 2011)

– ‘…if many managers have sufficient skill to cover costs, they are hidden by the mass of managers with insufficient skill. …true α in net returns to investors is negative for most if not all active funds’ (Fama & French, 2009)

5

Research Motivation• The Australian experience is similar– 70% of active retail funds

underperformed the benchmark over both the 1 and 3 year period (Karaban & Maguire, 2012)

– Over a five year period 69% of active retail funds failed to outperform the ASX200AI (Karaban & Maguire, 2012)

6

Research Motivation• ‘The attempt to predict accurately the future

course of stock prices and thus the appropriate time to buy or sell a stock must rank as one of investors’ most persistent endeavours.’ (Malkiel, 2011)

• Advances in computing power in combination with the widespread availability of historical datasets has provided investors with increased opportunity to test markets for predictable returns

7

Research Motivation• Artificial Neural Networks (‘ANNs’)

have several traits that suggest their suitability for stock price prediction– Ability to generalise and learn– Provide an analytic alternative to other

techniques that rely on assumptions of normality, linearity, and variable independence

8

Neural Network Primer• An artificial neural network is a

mathematical model that mimics the structure and function of biological brains

• A structure of highly interconnected processing neurons

9

Neural Network Primer• Typical neural processing element

X1

X2

Xn

Neuron

Inputs X1 to

Xn

W1

W2

Wn

Input Weights

W1 to Wn

WX i

n

iiv

0

Neuron Activatio

n

Transform-

ation

)tanh(vy

Output

Output

Learning

Error Back propagatio

n

10

Neural Network Primer• Neural networks comprise

interconnected structure of neuronsInput

Input

Input

OutputOutputOutputInput

LayerHidde

n Layer

Output

Layer

11

Neural Network Primer• Network typology – 2 fundamental classes of network architecture• Feed-forward network – information travels in the

forward direction only– Single layer feed-forward network– Multi layer feed-forward network

• Recurrent network – contains at least one feedback loop– Many types (Fully recurrent, Hopfield, Elman, Jordan, etc

etc)– Several studies demonstrate that feed-forward

networks outperform recurrent networks at predicting non-linear time series data

12

Neural Network Primer• Network learning – The Back Propagation

Algorithm– Input weights are randomly applied to the

network– Network calculations occur and the network

produces an output signal– Simulated output compared to target output to

provide system error– Error is back-propagated through the network– Process repeats until error reaches desired level

or maximum number of iterations is reached

13

Literature Review• Neural networks are relatively new. – No published research in the financial

domain prior to 1990. By 1993, over 10 studies per year were being published. (Wong & Yakup, 1998)

14

Literature Review• Yoon & Swales (1991) used ANNs to

predict prices of 98 companies and compare the ANN predictions to multiple discriminant analysis– The study concluded the ANN technique

significantly outperformed multiple discriminant analysis

15

Literature Review• Kryzanowski, Galler & Wright (1992)

used ANNs to pick stocks– Fourteen fundamental ratios were used

as inputs – coded as trending upwards/downwards or stable

– The ANN model correctly predicted price directional movement over the next year in 66% of cases

– The ANN did not predict extreme movements well

16

Literature Review• Hall, 1994 used ANN to create style

based portfolio – ANNs used to identify stocks with largest

recent price increased and apply this style to the current market to select a portfolio

– Utilised 1,000 largest US stocks– 37 inputs used to predict future stock prices– The portfolio has exceeded the S&P500

since inception after transaction costs, but, ‘Actual performance results and specific details of the system are proprietary’.

17

Literature Review• Jai & Lang (1994) used ANNs to trade

the TSE– Used 16 technical indicators to predict

the trend of the price movement over the subsequent 6 trading daysYear 1990 1991Type of ANN Fixed

structureDual

adaptive structure

Fixed structure

Dual adaptive structure

Total return % -30.66% 21.61% 25.76% 29.20%Annual TSEWPI return%

-39.52% 23.42%

Note the large divergence in performance

18

Literature Review• Freitas et al. (2001) used neural network

to provide an estimate of future returns of 66 representative stocks listed on the Brazilian BOVESPA stock exchange. –Worst network prediction error 33%– Best network prediction error 7%– Average error almost 17% – Neural network portfolios achieved better

performance in 19 out of 21 weeks and outperformed the benchmark portfolio by 12.39%.

19

Literature Review• Ellis & Wilson (2005) used ANNs to

classify Australian listed REITs as value stocks. – Value stocks were bought and portfolio

returns measured – ANN portfolio outperformed the

benchmark index 6.97% per month on average

– Sharpe and Sortino ratios significantly greater than the benchmark indices

20

Literature Review• Ko & Lin (2008) used ANNs for portfolio

selection on Taiwan Stock Exchange (‘TSE’). – Focussed on 21 different equities– Used price, variance, covariance as input

variables– ANN used to predict equity allocation ratio– 5yr data set including training & validation sets– ANN model outperformed TSE (no performance

measurement framework)

21

Literature Review• Vanstone et al. (2010) used neural networks

based on a stock trading filter rule. The filter rule bought stocks when the following criteria were met

• PE < 10• Market Price < Book Value• ROE < 12• Dividend Payout Ratio < 25%

– System yielded low number of trading opportunities– The return achieved by the system exceeded the

return available using the original filter rule– It is worth noting however that the higher return

achieved, was at least partially due to increased risk.

22

Research Questions• Core Research Question 1– Can neural networks that utilise a

combination of fundamental and technical inputs predict (a) future stock prices, (b) stock price directional movement?

• Secondary Research Question 1–Which network inputs contribute most to

network accuracy?

23

Research Questions• Secondary Research Question 2–What network parameters lead to optimal

predictive capability?• Secondary Research Question 3– Do neural networks predict any of the known

performance attribution factors?

• All research questions are to be answered with reference to stocks listed on the ASX

24

Network Specification• Kaastra & Boyd (1996) framework for

design of financial and econometric time seriesStep 1: Variable selection

Step 2: Data collection

Step 3: Data preprocessing Step 4: Training, testing, and validation sets

Step 7: Neural network training

number of training iterations

learning rate and momentum

Step 6: Evaluation criteria

Step 5: Neural network paradigms

number of hidden layers

number of hidden neurons

number of output neurons

transfer functions

Step 8: Implementation

Iterative Process

25

Variable SelectionFUNDAMENTAL INPUTS TECHNICAL INPUTS

Profit Margin

DPS Open High Low Close

Dividend Yield

CF per share

20d MA 50d MA 20d EMA 50d EMA

BV per share

BV per share growth

MACD RSI CLV ADI

Current ratio

Quick Ratio 20d Momentum

50d Momentum

6mth Momentum

12mth Momentum

Asset Turnover

Debt to Equity

Fast Stochastic Oscillator

Fast Stochastic Indicator

Slow Stochastic Oscillator

Slow Stochastic Indicator

Interest Cover

Price to CF Rate of Change(Price 1wk)

Rate of Change(Price 4wk)

Rate of Change (Price 10wk)

Rate of Change(MACD)

PE Price to Sales

Rate of Change(MACD Signal)

Rate of Change(RSI)

Rate of Change(ADI)

Rate of Change(20d MA)

ROA ROE Rate of Change(50d MA)

Rate of Change(20d EMA)

Rate of Change(50d EMA)

Rate of Change(12mth Mom)

FCF per share

Sales growth

Rate of Change(FSO)

Rate of Change(FSI)

Rate of Change(SSO)

Rate of Change(SSI)

- - Volume - - -

26

Data Set• Dataset obtained from Bloomberg• Weekly data from Jan 1997 to Dec

2011• Early data (1997-1999) used to

calculate technical indicators• 20 widely held ASX50 stocks selected

AMC ANZ BHP CBA CCACSL DJS GPT LEI NABORG QAN QBE RIO SUNTLS WBC WDC WES WOW

27

Data Preprocessing• 3 pre-processing algorithms utilised – – Remove duplicate data

• Duplicate data provides no useful information to the ANN

• Speeds calculation– Scale data

• Ensures all inputs treated equally• Avoids saturation of the tansig transformation

function– Mark missing data

• Ensures that missing data is not used in the learning process

28

Frequency of Data• Little theory to assist in determining

optimum frequency of data for a given prediction horizon Deboeck (1994) – – Long term prediction are unreliable– Short term predictions are unreliable– There is some period into the future for

which useful predictions can be made

29

Training, Testing, Validation Sets• The data set needs to be divided into

training, validation, and testing datasets (Beale et al., 2011)– – Training set –used to compute the

gradient and updating weights– Validation set – the data set used for

monitoring the error during training– Testing set –out-of-sample test data set

30

Training, Testing, Validation Sets• Walk forward testing –

t0 tn

Time

Window 1

Window 2

Window 3

Window 4

Window 5

Training WindowValidation WindowTesting Window

Testing Period

31

Network Implementation

Summing junction+ Tapped delay lineBias unitbx Input WeightTDL IW LW Layer Weight Transfer function

Input Layer

5918 Fundamental Inputs

41 Technical Inputs

59

Tansig

b1

4 → 20TDL

1

Hidden Layer

30-150

IW1,1 +

LW1,2

TDL

Output

Tansig

Output Layer

b2

LW2,1

1

1

+

32

Evaluation Criteria & Training

• Mean Squared Error (‘MSE’) selected as error function to be minimised by the network

• Convergence method adopted for network training. Training occurs until the earlier of – – 100,000 training iterations occurs

OR– 50 iterations with no reduction in validation

set error

33

ImplementationParameter Options No. of

Options

Stocks/portfolios

20 stocks +Equally and value weighted portfolios of these stocks

22

Input type Price ORTechnical indicators ORFundamental indicators ORPrice + Technical ORPrice + Fundamental ORPrice + Technical + Fundamental

6

Lookback window

4, 8, 12, 16, 20 periods 5

Hidden layer size

30, 60, 90, 120, 150 neurons

5

Training period length

3mths, 6mths, 12mths, 24mths

4

Walk forward testing

10 x 1-year testing periods from 2001 to 2011

10

Total networks = 132,000

• All testing undertaken on QUT supercomputer ‘Lyra’

• At time of testing Lyra consisted of - • 106 compute nodes• 1572 x 64 bit Intel

Xeon cores (consisting 11,736 core processors)

• 32.8 TeraFlop Theoretical (double precision), 58.6 TeraFlop (single precision)

• 10.4 TeraBytes of main memory

34

Results• The high number of network models specified

(132,000) meant that manual review and interpretation of individual networks was not feasible.

• The algorithm selected the network configuration that provides the lowest mean squared error over the one year walk forward testing window. The algorithm therefore provides 220 outputs. This is comprised as follows:– 20 stocks + 2 portfolios – 10 walk forward testing windows for each

stock/portfolio– Total outputs = 220

35

Input Type

Price Fund Tech

Price+Fund

Price+Tech

Price+Tech+Fund05

101520253035404550

Histogram of Input Type

Input Type

Freq

uenc

y

17%

14%

10%

21% 22%

16%

36

Lookback Window

4 8 12 16 2005

101520253035404550

Histogram of Lookback

Lookback Period (no. inputs)

Freq

uenc

y

19%21%

19%

21% 20%

37

Hidden Layer Size

30 60 90 120 1500

20

40

60

80

100

120

Histogram of Hidden layer

Hidden Layer Size (no. of nodes)

Freq

uenc

y

Smaller hidden layer produced optimum network in 47% of cases

47%

16%

9%

18%

10%

38

Training Period

3mths 6mths 12mths 24mths0

10

20

30

40

50

60

70

80

Histogram of Training period

Training Period Length

Freq

uenc

y

35%

20%

15%

30%

39

Predictive Capability

2002 2003 2004 2005 2006 2007 2008 2009 2010 20111 2 3 4 5 6 7 8 9 10

0%

5%

10%

15%

20%

25%

30%

RMSE as % of Share PriceAMCANZBHPCBACCLCSLDJSGPTLEINABORGQANQBERIOSUNTLSWBCWDCWESWOW

Year/Timestep

RMSE

% o

f Sha

re P

rice

Generally there is a major spike in error in 2008 & 2009

Generally predictions are fairly consistent from 2002-2007

40

Predictive Capability

01/20

02

07/20

02

01/20

03

07/20

03

01/20

04

07/20

04

01/20

05

07/20

05

01/20

06

07/20

06

01/20

07

07/20

07

01/20

08

07/20

08

01/20

09

07/20

09

01/20

10

07/20

10

01/20

11

07/20

11

01/20

122,0002,5003,0003,5004,0004,5005,0005,5006,0006,5007,000

ASX200 - 2002 to 2011ASX200 Adjusted Close Price

Sustained benign trading conditions leading up to GFC

2008 Major reversal, Bear Market

2009 Major reversal, Bull Market

2010 Return to benign trading conditions

41

Directional Movement

2002 2003 2004 2005 2006 2007 2008 2009 2010 20111 2 3 4 5 6 7 8 9 10

0%

10%

20%

30%

40%

50%

60%

70%

80%Directional Movement Prediction Accuracy

AMCANZBHPCBACCLCSLDJSGPTLEINABORGQANQBERIOSUNTLSWBCWDCWESWOWEWVW

Year/Timestep

Pred

icti

on A

ccur

acy

42

Directional Movement

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100% Directional Movement Prediction AccuracyOver 10-year Testing Horizon

AMCANZBHPCBACCLCSLDJSGPTLEINABORGQANQBERIOSUNTLSWBCWDCWESWOWEWVW

Stocks & Portfolios

Pred

ictio

n Ac

cura

cy

Significant at 5% level

43

Portfolio Selection• The simulated price returns were

then used to form a long and a long-short portfolio

• Portfolios formed at the beginning of each month during the 10-year testing period and held until the end of the month

• Stocks weighted according to the absolute value of their simulated price return

44

Portfolio Price ReturnsPortfolio Price Return

(2002-2011)ASX200 19.6%ANN Long Portfolio 205.5%ANN Long-Short Portfolio

196.6%

• Raw Results – • Do not include transaction

costs• No performance measurement

framework

45

Regression Results

Long Portfolio Long-Short Portfolio

α 0.016** 0.013***RMRF 0.712*** -0.011SMB -0.317 -0.604**HML -0.416** -0.306WML -0.011 0.038R2 0.283*** 0.215***

Performance Measurement• The portfolio price returns were then

regressed against the Carhart 4-factor model

Both portfolios have a significantly positive alpha

Both models are significant

46

Optimal Network Portfolios• This analysis confirms that ANNs can be

used for portfolio selection to achieve positive alphaHOWEVER

• These results are based upon the most accurate network specification – which is only know post hoc

• The problem is whether ANNs can achieve positive alpha without this post hoc knowledge

47

Median Network Portfolios• For each walk-forward testing window for each

stock, 600 different network specifications were trialled– 6 input options– 5 lookback windows– 5 hidden layer sizes– 4 training period lengths– 600 network options per stock per testing period

• We have confirmed that the most accurate network achieves positive alpha – what about the median network?

48

Regression Results

Long Portfolio Long-Short Portfolio

α 0.012** -0.001RMRF 0.137 0.003SMB -0.050 -0.011HML -0.310 -0.064WML -0.326 -0.115R2 0.084* 0.012

Median Network Portfolios

Only the Long portfolio achieved a significant alphaOnly the Long portfolio model is significant and the Carhart style-factors explain a lower proportion of variation in data

49

Distribution of Alphas• Network parameter combinations

were applied uniformly to all stocks and the long portfolio based on expected returns was formed.

• Mimics a real world trading situation• This process resulted in 600

measurements of alpha.

50

Distribution of Alphas

0

20

40

60

80

100

120Histogram for Alpha

Median = 0.0061Only 12% of the

networks tested have negative alpha

127 of 529 positive alphas (27%) were statistically significant (p < 0.05)

51

Conclusions• Network parameter testing was undertaken into

several network parameters –– input type– hidden layer size– lookback window– training period length.

• No useful guidance on heuristics for the input type, lookback window, or training period length.

• Hidden layer size showed some promise. Almost half of the most accurate networks models utilised the smallest hidden layer of 30 nodes.

52

Conclusions• Price prediction performance was mixed. • Most networks performed poorly at the

beginning of the global financial crisis (beginning of 2008) and at the end (end of 2009). The networks fail to predict major reversals.

• The networks did not predict the portfolio price returns well. The value weighted portfolio error ranged from 9% (best year) to 32% (worst year) and only achieved a RMSE below 10% in 1 of the 10 testing years.

53

Conclusions• Stock directional movement prediction

performance was assessed. • The ANN models predicted directional

movement with better than 50% accuracy for all stocks and portfolios except for BHP.

• Significant results were achieved for 13 of the 22 stocks and portfolios tested.

54

Conclusions• Price predictions used for long and

long-short portfolio selection• Both portfolios achieved significantly

positive alpha. • The optimal network specification

was made by comparing the simulated prices to the target prices, thus requiring post hoc knowledge.

55

Conclusions• A distribution of alpha for all 600 different

network specifications was produced. • Could not reject hypothesis that the

distribution was normal (p < 0.05).• 88% of the networks produced positive

alpha (529 out of 600). 127 of which were significant (27%).

• 12% of networks produced negative alpha (71 out of 600). None of which were significant.

56

Limitations & Future Research

• Research tested several network parameters.

• Hidden layer size provided promising results.

• Further research is required to develop heuristics.

• Future research should look at individual stocks and industry sectors.

57

Limitations & Future Research

• The analysis has been undertaken on a price return basis, rather than an overall returns basis.

• Study utilised 20 widely held stocks. Needs testing over wider investment universe. Survivorship bias.

• Impact of trading costs not included in analysis

58

Questions and Discussion

portfolio selection with artificial neural networks

Economy & Finance