ma and arch time series model inference using minimum message length by: mony sak - 13080512...

MA and ARCH Time Series model inference using

Minimum Message Length

MA and ARCH Time Series model inference using

Minimum Message Length

By:Mony Sak - 13080512

Supervisors: Assoc. Prof. David Dowe, Dr Sid Ray

By:Mony Sak - 13080512

Supervisors: Assoc. Prof. David Dowe, Dr Sid Ray

ContentsContents

1. “The Problem”2. Time Series Concepts3. Minimum Message Length (MML)4. MML applied to Time Series5. My Project6. Results7. Conclusion & Future Work

1. “The Problem”2. Time Series Concepts3. Minimum Message Length (MML)4. MML applied to Time Series5. My Project6. Results7. Conclusion & Future Work

1. “The Problem”1. “The Problem”

Which model fits the data best? Which model fits the data best?



?



?

?

?

?

??

2. Time Series Concepts2. Time Series Concepts

What is a Time Series (TS)? What is a Time Series (TS)?

timetime

Observation value

Observation value

Observations over time Observations over time


What is a Time Series (TS)? Some examples (1 of 4):

Light Curve of Beta Persei, also known as Algol, or “demon star”1


Light Curve of Beta Persei, also known as Algol, or “demon star”1



Closing stock price of Apple Computer Inc. (AAPL) (1984-2005)2


Closing stock price of Apple Computer Inc. (AAPL) (1984-2005)2



Global temperature difference vs. Years3


Global temperature difference vs. Years3



Average monthly busridership (weekdays) in Iowa city (1971-1982)4


Average monthly busridership (weekdays) in Iowa city (1971-1982)4


Explanation A good model = good understanding of the

underlying process generating that data

Explanation A good model = good understanding of the

underlying process generating that data

Why study Time Series? Why study Time Series?

Prediction Predict future observation values

Prediction Predict future observation values

Control If we can predict future values, we are able to

‘control’ the time series to our benefit

Control If we can predict future values, we are able to

‘control’ the time series to our benefit

Description The best method of conveying information

Description The best method of conveying information


Some TS models (1 of 3) Autoregressive, order p = AR(p)

Current observation value is a sum of weighted past observation values + random error5

Some TS models (1 of 3) Autoregressive, order p = AR(p)

Current observation value is a sum of weighted past observation values + random error5


Some TS models (2 of 3) Moving Average, order q = MA(q)

Current observation value is a sum of weighted past error values + random error5

Some TS models (2 of 3) Moving Average, order q = MA(q)

Current observation value is a sum of weighted past error values + random error5


Some TS models (3 of 3) Autoregressive Conditional Heteroskedastic, order q =

ARCH(q) Current variance value is a sum of weighted past squared

error values5

Some TS models (3 of 3) Autoregressive Conditional Heteroskedastic, order q =

ARCH(q) Current variance value is a sum of weighted past squared

error values5


1 set of data… and many many models

1 set of data… and many many models ?

?

?

?

?

?


Partial solution to the “The Problem” Partial solution to the “The Problem”

The Model Selection Criterion (MSC) The Model Selection Criterion (MSC)

Objective scoring of different models Objective scoring of different models

An equation, based on parsimony

An equation, based on parsimony

i am a criterion!

+

+

101.21

99.90


Some popular Model Selection Criteria: Some popular Model Selection Criteria:

Akaike’s Information Criterion (AIC)6 Akaike’s Information Criterion (AIC)6

Bayesian Information Criterion (BIC)7 Bayesian Information Criterion (BIC)7

…many more incl. HQ8, RCL9, MML10 …many more incl. HQ8, RCL9, MML10

3. Minimum Message Length


What it is & History What it is & History Information-theoretic criterion for model selection and

point estimation Information-theoretic criterion for model selection and

point estimation

Developed here at Monash University by Wallace & Boulton in 196811

Developed here at Monash University by Wallace & Boulton in 196811

Has been applied to mixture modelling (“snob”), decision tree/graph induction, generalized Bayesian networks, and more…

Has been applied to mixture modelling (“snob”), decision tree/graph induction, generalized Bayesian networks, and more…



Theory Theory A “message” can be encoded in 2 parts:

Part 1: Model, Part 2: Data (given the Model in Part 1) Combined Message Length = Part 1 + Part 2

A “message” can be encoded in 2 parts: Part 1: Model, Part 2: Data (given the Model in Part 1) Combined Message Length = Part 1 + Part 2

We choose the model that yields the smallest Combined Message Length

We choose the model that yields the smallest Combined Message Length



Theory (example) Theory (example)

model 1

model 2

model 3

model 4

data|model 1

data|model 2

data|model 3

data|model 4



MML87 Approximation: MML87 Approximation: Developed by Wallace & Freeman in 198713 Developed by Wallace & Freeman in 198713

Part 1 (model): Part 1 (model):

Part 2 (data|model): Part 2 (data|model):

4. MML87-based MSC4. MML87-based MSC

Past Research MML87-based MSC for:

Past Research MML87-based MSC for:

AR model inference10, Stock market simulation of AR traders14

ARMAX models15

AR model inference10, Stock market simulation of AR traders14

ARMAX models15

…Results: MML does very well when compared to the other Model Selection Criteria

…Results: MML does very well when compared to the other Model Selection Criteria

4. MML-based MSC4. MML-based MSC

Motivation for my project How well does MML-based MSCs perform with other models?

Motivation for my project How well does MML-based MSCs perform with other models?

Results from Fitzgibbon, Dowe, Vahid (2004)10

Results from Fitzgibbon, Dowe, Vahid (2004)10

5. My Project5. My Project

How well does an MML-based MSC perform with: How well does an MML-based MSC perform with:

Moving Average (MA) models? Moving Average (MA) models? Autoregressive Conditional Heteroskedastic

(ARCH) models? Autoregressive Conditional Heteroskedastic

(ARCH) models?

We need to derive 2 MSCs, 1 for each model We need to derive 2 MSCs, 1 for each model

Complex math regarding Fisher Information matrix. We resort to approximations

Complex math regarding Fisher Information matrix. We resort to approximations

MA is a conditional mean model, whereas ARCH is a conditional variance model - quite different

MA is a conditional mean model, whereas ARCH is a conditional variance model - quite different

5. My Project5. My Project

MML87 equation we will be using MML87 equation we will be using

6. Results6. Results

Results (simulations) Moving Average (MA) models

(Results from Sak, Dowe, Ray (2005). Accepted for inclusion in proceedings of Advanced Computing in Financial Markets ‘05. Istanbul, Turkey. Dec 15-17, 2005.)16

Results (simulations) Moving Average (MA) models

(Results from Sak, Dowe, Ray (2005). Accepted for inclusion in proceedings of Advanced Computing in Financial Markets ‘05. Istanbul, Turkey. Dec 15-17, 2005.)16

6. Results (MA simulations)

6. Results (MA simulations)

6. Results (ARCH simulations)

6. Results (ARCH simulations)

7. Conclusion & Future Work

7. Conclusion & Future Work

Future Work Future Work Try other MML approximations such as MMLD17 Try other MML approximations such as MMLD17

Other Time Series models: Generalized ARCH (GARCH)18, Generalized/Indexed AR (GAR)18

Other Time Series models: Generalized ARCH (GARCH)18, Generalized/Indexed AR (GAR)18

Other parameter estimation methods: Maximum Likelihood Estimation (MLE) is very very slow!

Other parameter estimation methods: Maximum Likelihood Estimation (MLE) is very very slow!

Conclusion Conclusion MML-based MSC for MA models performs very well MML-based MSC for ARCH models….

MML-based MSC for MA models performs very well MML-based MSC for ARCH models….

Thanks!Thanks!

References (1 of 2)References (1 of 2)1. J. Stebbins. The measurement of the light of stars with a selenium photometer, with

an application to variations of Algol. The Astrophysical Journal, 32(3):185-214, 1910.2. Data obtained from http://finance.yahoo.com/q?s=aapl 3. Data obtained from http://www.elmhurst.edu/~chm/vchembook/globalwarmA.html4. Hyndman, R.J. (n.d.) Time Series Data Library, http://www-

personal.buseco.monash.edu.au/~hyndman/TSDL/. Accessed on 24 Oct., 2005.5. J. D. Hamilton. Time Series Analysis. Princeton University Press, 1994. 6. H. Akaike. Information theory as an extension of the Maximum Likelihood principle. In

Second International Symposium on Information Theory, pages 267-281, 1973. Petrov, B.N. and Csaki, F. (editors). Akademiai Kiado, Budapest.

7. G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):46-464, 1978.

8. E.J. Hannan and B.G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B (Methodological), 41(2):190-195, 1979.

9. H. Mitchell and D.M McKenzie. GARCH model selection criteria. Quantitative Finance, 3:262-284, 2003.

10. L.J. Fitzgibbon, D.L. Dowe, and F. Vahid. Minimum Message Length Autoregressive Model Order Selection. In M. Palanaswami, C. Chandra Sekhar, G. Kumar Venayagamoorthy, S. Mohan and M. K. Ghantasala (eds.), International Conference on Intelligent Sensing and Information Processing (ICISIP), pages 439-444, 2004. Chennai, India, 4-7 January 2004, (ISBN: 0-7803-8243-9, IEEE Catalogue Number: 04EX783), www.csse.monash.edu.au/∼dld/Publications/2004/Fitzgibbon+Dowe+Vahid2004.ref.

1. J. Stebbins. The measurement of the light of stars with a selenium photometer, with an application to variations of Algol. The Astrophysical Journal, 32(3):185-214, 1910.

2. Data obtained from http://finance.yahoo.com/q?s=aapl 3. Data obtained from http://www.elmhurst.edu/~chm/vchembook/globalwarmA.html4. Hyndman, R.J. (n.d.) Time Series Data Library, http://www-

personal.buseco.monash.edu.au/~hyndman/TSDL/. Accessed on 24 Oct., 2005.5. J. D. Hamilton. Time Series Analysis. Princeton University Press, 1994. 6. H. Akaike. Information theory as an extension of the Maximum Likelihood principle. In

Second International Symposium on Information Theory, pages 267-281, 1973. Petrov, B.N. and Csaki, F. (editors). Akademiai Kiado, Budapest.

7. G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):46-464, 1978.

8. E.J. Hannan and B.G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B (Methodological), 41(2):190-195, 1979.

9. H. Mitchell and D.M McKenzie. GARCH model selection criteria. Quantitative Finance, 3:262-284, 2003.

10. L.J. Fitzgibbon, D.L. Dowe, and F. Vahid. Minimum Message Length Autoregressive Model Order Selection. In M. Palanaswami, C. Chandra Sekhar, G. Kumar Venayagamoorthy, S. Mohan and M. K. Ghantasala (eds.), International Conference on Intelligent Sensing and Information Processing (ICISIP), pages 439-444, 2004. Chennai, India, 4-7 January 2004, (ISBN: 0-7803-8243-9, IEEE Catalogue Number: 04EX783), www.csse.monash.edu.au/∼dld/Publications/2004/Fitzgibbon+Dowe+Vahid2004.ref.

Want a copy of these slides? Send requests to [email protected] a copy of these slides? Send requests to [email protected]

http://www.elmhurst.edu/~chm/vchembook/globalwarmA.html

http://www.elmhurst.edu/~chm/vchembook/globalwarmA.html

mailto:[email protected]

mailto:[email protected]

References (2 of 2)References (2 of 2)11. C.S. Wallace and D.M. Boulton. An information measure for classification. Computer

Journal, 11(2):185-194, 1968. 12. L.J. Fitzgibbon. Message from Monte Carlo: A Framework for Minimum Message Length

Inference using Markov Chain Monte Carlo Methods. PhD thesis, Monash University, Clayton Campus. Wellington Rd, Clayton. Victoria 3800, Australia, 2004.

13. C.S. Wallace and P.R. Freeman. Estimation and inference by compact encoding. Journal of the Royal Statistical Society. Series B (Methodological), 49(3):240-265, 1987.

14. M. J. Collie, D. L. Dowe, and L. J. Fitzgibbon. Stock market simulation and inference technique, 2005. Accepted for inclusion in proceedings of the 5th international conference on Hybrid Intelligent Systems (HIS’05), Rio de Janeiro, Brazil, November 6-9, 2005.

15. [ Schmidt ]16. M. Sak, D.L. Dowe, and S. Ray. Minimum Message Length Moving Average Time Series

Data Mining. In Computational Intelligence: Methods and Applications. First International ICSC Symposium on Advanced Computing in Financial Markets (ACFM2005), 2005. Accepted for inclusion in proceedings of Advanced Computing in Financial Markets (ACFM2005), Istanbul, Turkey. Dec. 15-17, 2005.

17. E. Lam. Improved Approximations in MML. Honours Thesis, Monash University, School of Computer Science and Software Engineering (CSSE), Monash University, Clayton 3168, Australia, 2000.

18. T. Bollerslev. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31:307-27, 1986.

19. M.S. Peiris. Improving the Quality of Forecasting using Generalized AR Models: An

Application to Statistical Quality Control. Statistical Methods, 5(2):156-171, 2003.

11. C.S. Wallace and D.M. Boulton. An information measure for classification. Computer Journal, 11(2):185-194, 1968.

12. L.J. Fitzgibbon. Message from Monte Carlo: A Framework for Minimum Message Length Inference using Markov Chain Monte Carlo Methods. PhD thesis, Monash University, Clayton Campus. Wellington Rd, Clayton. Victoria 3800, Australia, 2004.

13. C.S. Wallace and P.R. Freeman. Estimation and inference by compact encoding. Journal of the Royal Statistical Society. Series B (Methodological), 49(3):240-265, 1987.

14. M. J. Collie, D. L. Dowe, and L. J. Fitzgibbon. Stock market simulation and inference technique, 2005. Accepted for inclusion in proceedings of the 5th international conference on Hybrid Intelligent Systems (HIS’05), Rio de Janeiro, Brazil, November 6-9, 2005.

15. [ Schmidt ]16. M. Sak, D.L. Dowe, and S. Ray. Minimum Message Length Moving Average Time Series

Data Mining. In Computational Intelligence: Methods and Applications. First International ICSC Symposium on Advanced Computing in Financial Markets (ACFM2005), 2005. Accepted for inclusion in proceedings of Advanced Computing in Financial Markets (ACFM2005), Istanbul, Turkey. Dec. 15-17, 2005.

17. E. Lam. Improved Approximations in MML. Honours Thesis, Monash University, School of Computer Science and Software Engineering (CSSE), Monash University, Clayton 3168, Australia, 2000.

18. T. Bollerslev. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31:307-27, 1986.

19. M.S. Peiris. Improving the Quality of Forecasting using Generalized AR Models: An

Application to Statistical Quality Control. Statistical Methods, 5(2):156-171, 2003.

Negative Log Likelihood Negative Log Likelihood

Takes into account the estimated variance Takes into account the estimated variance

6. Results6. Results

Empirical Comparison Empirical Comparison1. Simulate data sets for 200 models for each model order

(i.e. MA(1) - MA(8)) for a total of 1,600 MA data sets1. Simulate data sets for 200 models for each model order

(i.e. MA(1) - MA(8)) for a total of 1,600 MA data sets

2. Estimate model parameters using Maximum Likelihood (MLE)

2. Estimate model parameters using Maximum Likelihood (MLE)

3. Pass to each Model Selection Criterion (MSC) the same 1,600 data sets and parameter estimates (for each data set), and let them choose the model they think best represents the data

3. Pass to each Model Selection Criterion (MSC) the same 1,600 data sets and parameter estimates (for each data set), and let them choose the model they think best represents the data

5. Repeat experiment for ARCH models (again 1,600 data sets)

5. Repeat experiment for ARCH models (again 1,600 data sets)

4. Assessment is on correct model order selection accuracy and negative log likelihoood

4. Assessment is on correct model order selection accuracy and negative log likelihoood

ma and arch time series model inference using minimum message length by: mony sak - 13080512...

Documents