time series prediction forecasting the future and understanding the past santa fe institute...

18
Time Series Prediction Time Series Prediction Forecasting the Future and Forecasting the Future and Understanding the Past Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited by Andreas Weingend and Neil Gershenfeld Edited by Andreas Weingend and Neil Gershenfeld NIST Complex System Program NIST Complex System Program Perspectives on Standard Benchmark Data Perspectives on Standard Benchmark Data In Quantifying Complex Systems In Quantifying Complex Systems Vincent Stanford Vincent Stanford Complex Systems Test Bed project Complex Systems Test Bed project August 31, 2007 August 31, 2007

Upload: jeremy-martin

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time Series PredictionTime Series PredictionForecasting the Future andForecasting the Future and

Understanding the PastUnderstanding the PastSanta Fe Institute Proceedings on the Studies in the Sciences of ComplexitySanta Fe Institute Proceedings on the Studies in the Sciences of Complexity

Edited by Andreas Weingend and Neil GershenfeldEdited by Andreas Weingend and Neil Gershenfeld

Time Series PredictionTime Series PredictionForecasting the Future andForecasting the Future and

Understanding the PastUnderstanding the PastSanta Fe Institute Proceedings on the Studies in the Sciences of ComplexitySanta Fe Institute Proceedings on the Studies in the Sciences of Complexity

Edited by Andreas Weingend and Neil GershenfeldEdited by Andreas Weingend and Neil Gershenfeld

NIST Complex System ProgramNIST Complex System ProgramPerspectives on Standard Benchmark DataPerspectives on Standard Benchmark Data

In Quantifying Complex SystemsIn Quantifying Complex Systems

Vincent StanfordVincent Stanford

Complex Systems Test Bed projectComplex Systems Test Bed project

August 31, 2007August 31, 2007

NIST Complex System ProgramNIST Complex System ProgramPerspectives on Standard Benchmark DataPerspectives on Standard Benchmark Data

In Quantifying Complex SystemsIn Quantifying Complex Systems

Vincent StanfordVincent Stanford

Complex Systems Test Bed projectComplex Systems Test Bed project

August 31, 2007August 31, 2007

Page 2: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Chaos in Nature, Theory, and Technology

Chaos in Nature, Theory, and Technology

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Rings of SaturnRings of Saturn Lorentz AttractorLorentz Attractor Aircraft dynamics at Aircraft dynamics at high angles of attackhigh angles of attackAircraft dynamics at Aircraft dynamics at high angles of attackhigh angles of attack

Page 3: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time Series Prediction A Santa Fe Institute competition using standard data sets

Time Series Prediction A Santa Fe Institute competition using standard data sets

Santa Fe Institute (SFI) founded in 1984 to “… focus the tools of traditional scientific disciplines and emerging computer resources on … the multidisciplinary study of complex systems…”

“This book is the result of an unsuccessful joke. … Out of frustration with the fragmented and anecdotal literature, we made what we thought was a humorous suggestion: run a competition. …no one laughed.”

Time series from physics, biology, economics, …, beg the same questions: What happens next? What kind of system produced this time series? How much can we learn about the producing system?

Quantitative answers can permit direct comparisons Make some standard data sets in consultation with subject matter experts in

a variety of areas. Very NISTY; but we are in a much better position to do this in the age of

Google and the Internet.

Santa Fe Institute (SFI) founded in 1984 to “… focus the tools of traditional scientific disciplines and emerging computer resources on … the multidisciplinary study of complex systems…”

“This book is the result of an unsuccessful joke. … Out of frustration with the fragmented and anecdotal literature, we made what we thought was a humorous suggestion: run a competition. …no one laughed.”

Time series from physics, biology, economics, …, beg the same questions: What happens next? What kind of system produced this time series? How much can we learn about the producing system?

Quantitative answers can permit direct comparisons Make some standard data sets in consultation with subject matter experts in

a variety of areas. Very NISTY; but we are in a much better position to do this in the age of

Google and the Internet.

Page 4: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Selecting benchmark data setsFor inclusion in the book

Selecting benchmark data setsFor inclusion in the book

Subject matter expert advisor group: Biology Economics Astrophysics Numerical Analysis Statistics Dynamical Systems Experimental Physics

Subject matter expert advisor group: Biology Economics Astrophysics Numerical Analysis Statistics Dynamical Systems Experimental Physics

Page 5: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

The Data SetsThe Data Sets

A. Far-infrared laser

excitation

B. Sleep Apnea

C. Currency exchange rates

D. Particle driven in

nonlinear multiple well

potentials

E. Variable star data

F. J. S. Bach fugue notes

A. Far-infrared laser

excitation

B. Sleep Apnea

C. Currency exchange rates

D. Particle driven in

nonlinear multiple well

potentials

E. Variable star data

F. J. S. Bach fugue notes

Page 6: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

J.S. Bach benchmarkJ.S. Bach benchmark

Dynamic, yes. But is it an iterative

map? Is it amenable to

time delay embedding?

Dynamic, yes. But is it an iterative

map? Is it amenable to

time delay embedding?

Page 7: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Competition TasksCompetition Tasks Predict the withheld continuations of the data

sets provided for training and measure errors Characterize the systems as to:

Degrees of Freedom Predictability Noise characteristics Nonlinearity of the system

Infer a model for the governing equations Describe the algorithms employed

Predict the withheld continuations of the data sets provided for training and measure errors

Characterize the systems as to: Degrees of Freedom Predictability Noise characteristics Nonlinearity of the system

Infer a model for the governing equations Describe the algorithms employed

Page 8: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Complex Time Series Benchmark Taxonomy Complex Time Series Benchmark Taxonomy

Natural Stationary Low dimensional Clean Short Documented Linear Scalar One trial

Continuous

Natural Stationary Low dimensional Clean Short Documented Linear Scalar One trial

Continuous

Synthetic Nonstationary Stochastic Noisy Long Blind Nonlinear Vector Many trials

Discontinuous Switching Catastrophes Episodes

Synthetic Nonstationary Stochastic Noisy Long Blind Nonlinear Vector Many trials

Discontinuous Switching Catastrophes Episodes

Page 9: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time honored linear modelsTime honored linear models

Auto Regressive Moving Average (ARMA) Many linear estimation techniques based on Least

Squares, or Least Mean Squares Power spectra, and Autocorrelation characterize such

linear systems Randomness comes only from forcing function x(t)

Auto Regressive Moving Average (ARMA) Many linear estimation techniques based on Least

Squares, or Least Mean Squares Power spectra, and Autocorrelation characterize such

linear systems Randomness comes only from forcing function x(t)

y[t +1] = ai ⋅ y[t − i]i= 0

NAR

∑ + b j

j= 0

N MA

∑ ⋅ x[t − i]

Page 10: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Simple nonlinear systemscan exhibit chaotic behaviorSimple nonlinear systems

can exhibit chaotic behavior

Spectrum, autocorrelation, characterize linear systems, not these

Deterministic chaos looks random to linear analysis methods

Logistic map is an early example (Elam 1957).

Spectrum, autocorrelation, characterize linear systems, not these

Deterministic chaos looks random to linear analysis methods

Logistic map is an early example (Elam 1957).

x[t +1] = r ⋅ x[t](1− x[t])

Logisic map 2.9 < r < 3.99

Page 11: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Understanding and learningcomments from SFI

Understanding and learningcomments from SFI

Weak to Strong models - many parameters to few Data poor to data rich Theory poor to theory rich Weak models progress to strong, e.g. planetary

motion: Tycho Brahe: observes and records raw data Kepler: equal areas swept in equal time Newton: universal gravitation, mechanics, and calculus Poincaré: fails to solve three body problem Sussman and Wisdom: Chaos ensues with computational

solution! Is that a simplification?

Weak to Strong models - many parameters to few Data poor to data rich Theory poor to theory rich Weak models progress to strong, e.g. planetary

motion: Tycho Brahe: observes and records raw data Kepler: equal areas swept in equal time Newton: universal gravitation, mechanics, and calculus Poincaré: fails to solve three body problem Sussman and Wisdom: Chaos ensues with computational

solution! Is that a simplification?

Page 12: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Discovering properties of dataand inferring (complex) modelsDiscovering properties of dataand inferring (complex) models

Can’t decompose an output into the product of input and transfer function Y(z)=H(z)X(z) by doing a Z, Laplace, or Fourier transform.

Linear Perceptrons were shown to have severe limitations by Minsky and Papert

Perceptrons with non-linear threshold logic can solve XOR and many classifications not available with linear version

But according to SFI: “Learning XOR is as interesting as memorizing the phone book. More interesting - and more realistic - are real-world problems, such as prediction of financial data.”

Many approaches are investigated

Can’t decompose an output into the product of input and transfer function Y(z)=H(z)X(z) by doing a Z, Laplace, or Fourier transform.

Linear Perceptrons were shown to have severe limitations by Minsky and Papert

Perceptrons with non-linear threshold logic can solve XOR and many classifications not available with linear version

But according to SFI: “Learning XOR is as interesting as memorizing the phone book. More interesting - and more realistic - are real-world problems, such as prediction of financial data.”

Many approaches are investigated

Page 13: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time delay embeddingDiffers from traditional experimental measurements

Time delay embeddingDiffers from traditional experimental measurements

Provides detailed information about degrees of freedom beyond the scalar measured

Rests on probabilistic assumptions - though not guaranteed to be valid for any particular system

Reconstructed dynamics are seen through an unknown “smooth transformation”

Therefore allows precise questions only about invariants under “smooth transformations”

It can still be used for forecasting a time series and “characterizing essential features of the dynamics that produced it”

Provides detailed information about degrees of freedom beyond the scalar measured

Rests on probabilistic assumptions - though not guaranteed to be valid for any particular system

Reconstructed dynamics are seen through an unknown “smooth transformation”

Therefore allows precise questions only about invariants under “smooth transformations”

It can still be used for forecasting a time series and “characterizing essential features of the dynamics that produced it”

Page 14: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time delay embedding theorems“The most important Phase Space Reconstruction technique is the

method of delays”

Time delay embedding theorems“The most important Phase Space Reconstruction technique is the

method of delays”

Assuming the dynamics f(X) on a V dimensional manifold has a strange attractor A with box counting dimension dA

s(X) is a twice differentiable scalar measurement giving {sn}={s(Xn)} M is called the embedding dimension is generally referred to as the delay, or lag Embedding theorems: if {sn} consists of scalar measurements of

the state a dynamical system then, under suitable hypotheses, the time delay embedding {Sn} is a one-to-one transformed image of the {Xn}, provided M > 2dA. (e.g. Takens 1981, Lecture Notes in Mathematics, Springer-Verlag; or Sauer and Yorke, J. of Statistical Physics, 1991)

Assuming the dynamics f(X) on a V dimensional manifold has a strange attractor A with box counting dimension dA

s(X) is a twice differentiable scalar measurement giving {sn}={s(Xn)} M is called the embedding dimension is generally referred to as the delay, or lag Embedding theorems: if {sn} consists of scalar measurements of

the state a dynamical system then, under suitable hypotheses, the time delay embedding {Sn} is a one-to-one transformed image of the {Xn}, provided M > 2dA. (e.g. Takens 1981, Lecture Notes in Mathematics, Springer-Verlag; or Sauer and Yorke, J. of Statistical Physics, 1991)

rS n = (sn−(M −1)τ ,sn−(M −2)τ ,...,sn )

rS n = (sn−(M −1)τ ,sn−(M −2)τ ,...,sn )

rx n +1 =

r f (

r x n )

rx n +1 =

r f (

r x n )

{sn} = {s(r x n )}

{sn} = {s(r x n )}Vector Vector

SequenceSequenceScalarScalarMeasurementMeasurement

Time delayTime delayVectorsVectors

Page 15: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time series predictionMany different techniques thrown at the data to “see if

anything sticks”

Time series predictionMany different techniques thrown at the data to “see if

anything sticks”Examples:

Delay coordinate embedding - Short term prediction by filtered delay coordinates and reconstruction with local linear models of the attractor (T. Sauer).

Neural networks with internal delay lines - Performed well on data set A (E. Wan), (M. Mozer)

Simple architectures for fast machines - “Know the data and your modeling technique” (X. Zhang and J. Hutchinson)

Forecasting pdf’s using HMMs with mixed states - Capturing “Embedology” (A. Frasar and A. Dimiriadis)

More…

Examples:

Delay coordinate embedding - Short term prediction by filtered delay coordinates and reconstruction with local linear models of the attractor (T. Sauer).

Neural networks with internal delay lines - Performed well on data set A (E. Wan), (M. Mozer)

Simple architectures for fast machines - “Know the data and your modeling technique” (X. Zhang and J. Hutchinson)

Forecasting pdf’s using HMMs with mixed states - Capturing “Embedology” (A. Frasar and A. Dimiriadis)

More…

Page 16: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Time series characterizationMany different techniques thrown at the data to “see if

anything sticks”

Time series characterizationMany different techniques thrown at the data to “see if

anything sticks”Examples:

Stochastic and deterministic modeling - Local linear approximation to attractors (M. Kasdagali and A. Weigend)

Estimating dimension and choosing time delays - Box counting (F. Pineda and J. Sommerer)

Quantifying Chaos using information-theoretic functionals - mutual information and nonlinearity testing.(M. Palus)

Statistics for detecting deterministic dynamics - Course grained flow averages (D. Kaplan)

More…

Examples:

Stochastic and deterministic modeling - Local linear approximation to attractors (M. Kasdagali and A. Weigend)

Estimating dimension and choosing time delays - Box counting (F. Pineda and J. Sommerer)

Quantifying Chaos using information-theoretic functionals - mutual information and nonlinearity testing.(M. Palus)

Statistics for detecting deterministic dynamics - Course grained flow averages (D. Kaplan)

More…

Page 17: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

What to make of this?Handbook for the corpus driven study of nonlinear dynamics

What to make of this?Handbook for the corpus driven study of nonlinear dynamics

Very NISTY: Convene a panel of leading researchers Identify areas of interest where improved characterization

and predictive measurements can be of assistance to the community

Identify standard reference data sets: Development corpra Test sets

Develop metrics for prediction and characterization Evaluate participants Is there a sponsor? Are there areas of special importance to communities we

know? For example: predicting catastrophic failures of machines from sensors.

Very NISTY: Convene a panel of leading researchers Identify areas of interest where improved characterization

and predictive measurements can be of assistance to the community

Identify standard reference data sets: Development corpra Test sets

Develop metrics for prediction and characterization Evaluate participants Is there a sponsor? Are there areas of special importance to communities we

know? For example: predicting catastrophic failures of machines from sensors.

Page 18: Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited

Ideas?Ideas?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.