genetic programming with syntactic restrictions applied … · genetic programming with syntactic...

23
Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach 1 , Olivier V. Pictet 2 , and Oliver Masutti 1 Abstract This study uses Genetic Programming (GP) to discover new types of volatility forecasting mod- els for financial time series. GP is a convenient tool to explore the space of potential forecasting models and to select the more robust solutions. The application to foreign exchange financial prob- lems requires an exact symmetry induced by the interchange of currencies. In GP, this symmetry is enforced by using a strongly typed GP approach and syntactic restrictions on the node set. GP convergence is increased by a few orders of magnitude by optimizing the constants in the GP trees with a local optimization algorithm. The various algorithms are compared on the discovery of the symmetric transcendental function cosine. For the volatility forecast, the optimization is per- formed using return time series sampled hourly, possibly including aggregated returns at longer time horizons. The in-sample optimization and out-of-sample tests are performed on 13 years of high frequency data for two foreign exchange time series. The out-of-sample forecasting perfor- mance of these new models are compared with the corresponding performance of some popular ARCH-types models, and GP consistently outperform the benchmarks. In particular, GP discov- ered that cross products of returns at different time horizons improve substantially the forecasting performance. Key words: Volatility forecasting, Genetic Programming, Syntactic restrictions 1 Olsen & Associates Research Institute for Applied Economics Seefeldstrasse 233, 8008 Z ¨ urich, Switzerland. e-mail: [email protected] phone: +41-1/386 48 48 Fax: +41-1/422 22 82 2 Dynamic Asset Management Chemin des Tulipiers 9, 1208 Gen` eve, Switzerland. e-mail: [email protected]

Upload: vankiet

Post on 07-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

GeneticProgramming with SyntacticRestrictionsapplied to Financial Volatility Forecasting

Gilles Zumbach1, Olivier V. Pictet2, andOliver Masutti1

Abstract

This studyusesGeneticProgramming(GP) to discover new typesof volatility forecastingmod-els for financialtime series.GPis a convenienttool to explore thespaceof potentialforecastingmodelsandto selectthemorerobustsolutions.Theapplicationto foreignexchangefinancialprob-lemsrequiresanexactsymmetryinducedby theinterchangeof currencies.In GP, this symmetryis enforcedby usinga stronglytypedGPapproachandsyntacticrestrictionson thenodeset.GPconvergenceis increasedby afew ordersof magnitudeby optimizingtheconstantsin theGPtreeswith a local optimizationalgorithm. The variousalgorithmsarecomparedon the discovery ofthesymmetrictranscendentalfunctioncosine.For thevolatility forecast,theoptimizationis per-formedusingreturntime seriessampledhourly, possiblyincluding aggregatedreturnsat longertime horizons.Thein-sampleoptimizationandout-of-sampletestsareperformedon 13 yearsofhigh frequency datafor two foreignexchangetime series.Theout-of-sampleforecastingperfor-manceof thesenew modelsarecomparedwith thecorrespondingperformanceof somepopularARCH-typesmodels,andGPconsistentlyoutperformthebenchmarks.In particular, GPdiscov-eredthatcrossproductsof returnsatdifferenttimehorizonsimprove substantiallytheforecastingperformance.

Keywords: Volatility forecasting, GeneticProgramming, Syntacticrestrictions

1Olsen& AssociatesResearchInstitutefor Applied EconomicsSeefeldstrasse233,8008Zurich,Switzerland.e-mail: [email protected]:+41-1/3864848 Fax: +41-1/42222 82

2DynamicAssetManagementChemindesTulipiers9, 1208Geneve,Switzerland.e-mail: [email protected]

Page 2: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

1 Intr oduction

Onechallengeposedby thefinancialmarketsis to correctlyforecastthedaily volatility of financialassetsin orderto obtainreasonablepredictionsof thepotentialrisks,or to optimallyallocateassetsin a portfolio. Clearly, thevolatility is itself a non trivial processwith interestingdynamics,andfrom theanalysisof empiricaldata,a large numberof stylizedpropertiesrelatedto volatility areknown. Themostimportantof thesepropertiesis thelong memoryof thevolatility, asmeasuredby a laggedautocorrelationfunction that decaysas a power law. This propertyis also calledvolatility clustering,andtheslow decayof theautocorrelationmeansthatthisclusteringis presentat all time horizons. The simplestmodelthat describesvolatility clusteringis the GARCH(1,1)model. Yet, this model hasan exponentialautocorrelationfunction for the volatility, meaningthat it capturesthevolatility clusteringonly at onetime horizon. In orderto remedythedifferentshortcomingsof the simplestGARCH(1,1)model, a large numberof variationsin the popularARCH (Autoregressive ConditionalHeteroskedasticity)classof modelshasbeenstudied,mostlyusingdaily data.Theuseof dataat higherfrequency opensnew avenuesin volatility forecastingasthe statisticaluncertaintyis decreasedandintra-dayeffectsmustbe taken into account. Butconsequently, the complexity of theproblemincreases.Undoubtedly, the useof high frequencydatapermitsbettershort term volatility estimations,but also implies more complexity in datatreatmentandvolatility modeling. Themain problemcanbe summarizedas: what arethemostimportantstylizedpropertiesthatmustbe taken into accountin orderto obtaina goodvolatilityforecast? In this study, we useGeneticProgramming(GP) to discover new typesof volatilityforecastingmodelsfor foreignexchange(FX) ratesat hourly frequency.

GeneticProgrammingis a tool that searchesin the spaceof possibleprograms,representedbytrees,for individualsthatarefit for solving thegivenproblem[Koza,1992]. For theapplicationof GPtechniquesto financialdata,a similar approachwasrecentlyappliedwith somesuccessontradingmodeldiscovery [Bhattacharyyaetal., 1998, Chopardetal., 2000], andwe useit in thispaperto explorethespaceof volatility forecastingmodels.In theapplicationto foreignexchangefinancial time series,thereis an importantexact symmetryinducedby the exchangeof the twocurrencies,andthissymmetrymustberespectedby thesolutions.For this purpose,we have useda stronglytypedGPapproach,wherethetyping systemkeepstrackof theparity of theGPtrees.In thisway, we have reducedoursearchspacefrom all possibleGPtreesdown to thesubspaceoftreesthathave thepropersymmetry.

Froma generalpoint of view, volatility forecastis a functionfitting problem,wheretherealizedvaluefor thevolatility is the function to be discoveredusinga causalinformationset. The timeseriesof therealizedvolatility is dominatedby randomness,andtheactualamountof informationcontainedin the informationset aboutthe future evolution is ratherlow. This is measuredforexampleby thelaggedcorrelationfor thevolatility, which is in theorderof 3 to 15%,dependingon the actualdefinition of volatility. In short, this meansthat the volatility forecastis a verydifficult challengefor GP, andthe algorithmsneedto be very efficient. In orderto develop andtesttools,we have appliedGP to a similar problem,namelythediscovery of the transcendentalfunctioncosine,usingpolynomials.As for thevolatility, thecosinefunctionobeys asymmetryofparity cos

x cos

x . Thestudywith thecosinemakesclearthat theconventionalGPcannot

tacklethevolatility forecastproblem,andthemaindifficulty lies in thediscovery of goodvalues

1

Page 3: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

for theconstantsincludedin theGPtrees.In orderto speedup convergenceto theoptimalvaluesof the constants,we have to usea local searchalgorithm,like a conjugategradient.Only whenusinga mixedalgorithm,arewe ableto obtaingoodsolutionsfor thecosineproblem,andto findvolatility forecastthatcancompetewith thestandardGARCH(1,1)model.

Thisarticleis organizedasfollows. In section2, theGeneticProgrammingapproachis described.A brief introductionto GP is given is the subsection2.1, followed by the GP with typesandthe relatedsyntacticrestrictionsin subsection2.2. The modificationsto the evolution operatorsrequiredby theGPwith typesarediscussedin subsection2.3. Subsection2.4presentshow to useGP with typesto imposea particularsymmetryon thesolutions. In subsection2.5, we give thesetof nodeswe have used,togetherwith their respective syntacticrestrictions.Control over theactualCPU time is critical for theapplicationon volatility forecasts,andthis meanscontrollingthe complexity of the generatedtrees,as discussedin the subsection2.6. The sectionon GPitself concludeswith thecombinationof GPwith with localsearchtechniquesfor theoptimizationof the tree’s constants.The section3 discussesthe applicationof GP to the discovery of thecosinefunctionby polynomialapproximations.This includesadetailedcomparisonof thevariousalgorithms,aswell asoptimizationof theevolution operatorsusedby theGP. Theapplicationtovolatility forecastis presentedin the section4. The peculiaritiesof the financial volatility arepresentedin theintroductorypart4.1, andthedaily volatility forecastin section4.2. Thevolatilityprocessesusedasbenchmarkfor comparisonwith the performanceof GP treesaregiven is thesection4.3. Section4.4 presentsthe resultsof GPsearchfor volatility forecast,andconclusionsaredrawn in section5.

2 GeneticProgramming with SyntacticRestrictions

2.1 Intr oduction

Geneticprogramming[Koza,1992] is a tool to searchthespaceof possibleprogramsfor an in-dividual (computerprogram)that is fit for solvinga giventaskor problem.It operatesthroughasimulatedevolution processon a populationof solutionstructuresthat representcandidatesolu-tionsin thesearchspace.Theevolution occursthrough: aselectionmechanismthatimplementasurvivalof thefitteststrategy geneticcross-over andmutationof theselectedsolutionsto produceoffspring for thenext

generation.

The generatedprogramsarerepresentedas trees,wherenodesdefinefunctionswith argumentsgiven by the valuesof the relatedsub-trees,andwhereleaf nodes,or terminals, representtaskrelatedconstantsor inputvariables.

Theselectionmechanismallows randomselectionof parenttreesfor reproduction,with abiasforthetreesthatrepresentbettersolutions.Selectedparentsareeithermutated,orusedtogeneratetwonew offspringsby a cross-over operator. Cross-oversandmutationsarethe two basicoperators

2

Page 4: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

usedto evolve a populationof trees. The mutationoperatoreffects randomchangesin a treeby randomlyalteringcertainnodesor sub-trees,whereasthecross-over operatoris anexchangeof sub-treesbetweentwo selectedparents.Theevolution of trees’populationscontinuesuntil acertainstoppingcriterion is reached.The initial populationis composedof randomtrees,whicharegeneratedby randomlypicking nodesfrom a given terminal setand functionset. The onlyconstraintis that thegeneratedtreesoughtnot to betoo complex. A restrictionon themaximumalloweddepthor themaximumnumberof nodesis alsofrequentlyimposed.

Sincetheevolution operatorsin GPcanestablisharbitraryfunctionsandterminalsasarguments(descendants)for a function node,the function set is requiredto be closedwith respectto thevariousargumentsthat it canhave. Theclosurehypothesisstatesthat the resultof any nodecanbe usedas argumentof any node,and this permitseasyimplementationof the cross-over andmutationoperators.Yet, this leadsto programswith unnaturalstructures,like booleanoperatorstakingrealarguments,or multiplicationnodesoperatingon theresultsof abooleanoperation.

Nowadays,modernprogramminglanguagessupportthenotionof types,andit is quitenaturaltointroducethisconceptintoGPtrees.ThisstudyfollowsthestronglytypedGPapproach[Montana,1995],wherethebranchesof a treecarryadatatype, andtheoperators(nodes)acceptonly somecombi-nationof types.Yet, the typing systemis usedin this paperto imposeanexactsymmetryon thegeneratedprograms.This symmetryis inducedby a parity transformation,namelyin theforeignexchangemarket, by thepermutationof thecurrency pair. In technicalterms,the typing systemis usedto imposea well definedrepresentationof the solutionin theZ2 groupgeneratedby thegraduationof theproblemby aparity transformation.

2.2 Typing systemand syntactic restrictions

A geneticprogramis a tree, whosenodesrepresentfunctions(with its subtreeas function ar-guments),and whoseleaves areassociatedwith task-relateddatainput or “constants”. As thetree-node-terminalterminologyis not alwaysaccurateandrich enough,we alsousea functionterminology. Then-arity of a functionfixesthenumberof subtreesattachedto thecorrespondingnode,while a functionwithoutargumentis a terminalnode:

f n : n In GP, it is usuallypostulatedthat any treeis a valid program(the closureproperty). The onlyrestrictionis then-arity of thefunctions,whichfixesthetopologyof thetree.In orderto introducetypes,asetof N typeCi is defined.A simpleexamplewith two typesisC1 = booleanandC2 .Then,a functionis amappingfrom typesinto a type,namely

f n : Ci1 Cin

Cin 1 i j 1 N (1)

For example,with thetwo above types,we canhave thefunctionsof two arguments: : boolean:

3

Page 5: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

Thetypingof thefunctionsinducessyntacticrestrictions, namelyvalid programsaretreeswhere,for all the branches,the output type of the argumentfunctionsarevalid argumenttypesfor thefunction node. The setof all programsthat obey the syntacticrestrictionsdefinesthe spaceofprogramswith types,with a possiblefurtherconstrainton thetypeof the root. Notice that thesesyntacticrestrictionscorrespondexactly to the syntacticrestrictionsof any strongly typed lan-guage,like C++.

As soonasthe typesof thevariablesin a problemarenot unique,like for examplewith the twotypesbooleanand real, it is indeedvery naturalto introducesuchsyntacticrestrictions. Thisavoidstheratherunpleasantandunnaturalconditionthata functionmustbeapplicableto all typesof argument.In ‘traditional’ GP, theclosurepropertyimposesanembeddingof thetypesinto .For example,booleansarerepresentedby arealnumber, with thevaluezerofor falseanddifferentfrom zero for true. This embeddingallows free mixing of the types. Yet, if the typesbecomemorecomplex, for examplewith vectors,matricesor lists, this embeddingbecomescompletelyartificial, if not impossible.Therefore,theintroductionof a propertyping systemis a mandatorysteptowardtheconstructionof morecomplex randomprograms.For thepresentpaper, ouruseoftypedGPis somewhatdifferentaswe useit to imposeagivensymmetryof parity to thesolution.Thiskind of useis possibleonly with types.

With the introductionof types,the problemto be solved is to find properevolution operators,namelyto generaterandomtrees,mutationsandcross-over operatorsthat respectthe syntacticrestriction. In otherwords,theevolution operatorsmustbeinvolutionsin thespaceof typedGP.Moreover, they mustbeergodiconthespaceof typedprograms.Beyondergodicity, thecross-overandmutationoperatorsmustalsobe‘efficient’ on thespaceof programs,andtheirefficiency mayberelatedto thenumberof typesandfunctions.

Fromthepointof view of theimplementation,thetypesaresimplyamappingto asetof integers,namely, to eachtypecorrespondsauniquenumber. Thespecificationof thefunctionsmustincludethepossibletypesof their argumentsandthe typeof the results.Then,thesyntacticrestrictionscorrespondsimply to theequalitybetweennumbersacrossconnectednodesandterminals. It isthereforequitesimpleto implementanarbitrarynumberof typesandthecorrespondingsyntacticrestrictions.The implementationof theevolution operatorsis morecomplex, in particularsomeoperationsmight fail anda recovery mechanismmustbeimplemented.

2.3 The evolution operators

Thefollowing sectiondescribestheelementaryoperationsusedto implementtheevolution oper-ators.Becauseof thetypes,thesetof possibleoperationsis largerthanin theusualGPapproach.

2.3.1 Construction of a random tree

Geneticprogramtreesaregrown by recursively appendingnodesor leavesthatmatchthetypere-strictionsof theparentnodesto thedanglingconnections.Thecomplexity of thetreeis controlledby biasingthe probability to returna nodeor a terminal,dependingon the depthof the current

4

Page 6: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

connectioncomparedto a targetdepth.If amaximaldepthis exceeded,theattemptto constructarandomtreefails.

2.3.2 Mutation operators

Themutationoperationswe implementedfor thegeneticprogramarethefollowing. ConstantleafmutationA constantterminalis mutatedby eitherchangingthesignof theconstant(with aprobabilityof 10%),or by multiplying theconstantvalueby ex, wherex is a randomnumberuniformlydistributedbetween

ln

2 andln

2 . NodesubstitutionA nodeis replacedby another, while thetypeconstraintsarepreserved. SubtreemutationA branchis pickedat random,thetreeattachedto this branchis deletedanda new randomtreewith thesameroot typeis attachedto thebranch. BranchtypemutationA nodeis picked at randomandis checked to seeif a branchtype canbe changedwhilekeepingthetypefor theotherbranches.If this typechangeis possible,thesub-treeattachedto this branchis deletedanda new randomtreewith thenew selectedroot type is attached(if several new typesarepossible,one is chosenat random). This particularmutationisimportantasit allows usto replaceaconstantterminalby asubtree. RootsplicingA nodeis insertedasthenew root nodeof the tree(i.e. on top of theexisting root node),andthe possibledanglingbranchesarefilled with new randomsubtrees.Again, the typeconstraintshave to berespected,in particularwith apossiblerestrictionof theroot type. NodeinsertionA nodeis insertedrandomlyby picking a branchat random,insertinga randomnodecom-patiblewith the branchtype, andcompletingthe possiblenew node’s danglingbrancheswith randomtrees. NodedeletionA nodeis selectedrandomlyfor deletion. If an argument’s type matchesthe type of theresults,thenodeis deletedandthesub-treeattachedto theparent,while theotherpossiblesub-treesaredeleted.If no descendantcanbe attachedto the parentof the selectednode(becauseof type incompatibility),anothercandidatenodeis selectedfor deletion. After afixednumberof failedattempt,nodedeletionfails.

This mutationtendstowardsimplertrees,andbalancesthe‘nodeinsertion’,‘root splicing’and‘branchtypemutation’which tendto createmorecomplex trees.

5

Page 7: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

In thesevariousoperators,whenarandomsub-treemustbegenerated,theglobaldepthof thetreeis controlledby thesamemechanismusedto grow randomtrees.

2.3.3 Crossover operator

Thecrossover operatortakestwo geneticprogramsasarguments.It selectsa branchrandomlyinthefathertree,andchecksif thereexistsat leastonebranchwith matchingtypein themothertree.If yes,a branchwith theselectedtypeis chosenrandomlyin themothertreeandthetwo selectedsubtreesareswapped.If not, theabove procedureis repeated.If thecrossover procedurestill failsafteramaximalnumberof attempts,thecrossover operationfails.

2.3.4 Mark ov chainsparametrization

We have defined7 kinds of mutation,and when a given tree is selectedfor mutation,one oftheseoperatorsis randomlychosenaccordingto somespecificpredefinedselectionprobabilities.The probabilitiesof selectingeachparticularmutationparametrizethe mutationoperatoron theGP space.The geneticalgorithmalsoperformscross-overs with a given probability. All theseprobabilitiescanbegivenin avectorthatparametrizestheMarkov chains,namely

pMC pconstantleafmutation

pnodesubstitution

psubtreemutation

pbranchtypemutation

proot splicing

pnodeinsertion

pnodedeletion

pcross-over

with the constraintthat the sumover the first 7 valuesmustbe one. In section3, this vectorofprobabilitiesis optimizedusing a geneticalgorithm in order to have the most efficient geneticevolution.

2.4 Using the typing systemto imposea symmetry on the solution

In someapplications,the generatedsolutionsmust respectsomespecificsymmetryconditionsinherentin the problemto be solved. The main examplepresentedin detail in section4, is thecaseof the volatility forecastmodelsbasedon price changesr. The volatility mustbe identicalfor a specificrateandfor thecorrespondinginvertedrate,i.e. σ

r ! σ

r . A simplerexample,

which is usedas a test casefor this study, is the fit of symmetricfunctions, like cosine,withpolynomialsapproximations(seesection3). In suchan application,the goal is to obtain GPtreeswhich representgoodapproximationsto theoriginal symmetricfunctionsandwith thesame

6

Page 8: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

symmetryproperties.Theoutputof theGPtreesis requiredto besymmetricT

x" Tx and

we usethesyntacticrestrictionsto imposethesymmetryof thesolution.

In orderto obtainmeaningfulsolutionswith geneticprogramming,we needto generateGPtreeswhichreturnsasymmetricoutputat therootnode.Thesymmetrytypeateachnodeof theGPtreesneedto beevaluatedto enforcetheoverall symmetryproperties.This is feasibleif thesymmetryof eachcomponentof the treeis classifiedaccordingto its symmetryproperty. All the functionsor terminalsusedin theconstructionof aGPtreeareclassifiedaccordingto thethreeoutputtypes: antisymmetrictypeA: A

x# A

x (e.g.x or x3), symmetrictypeS: S

x! S

x (e.g. $ x $ or x2), constanttypeC: numericalconstants,notaffectedby thetransformationx

%x.

A simpleexampleshown in figure1 is a GPtreecorrespondingto thepolynomialTx# a

x &

x & b' a

bx2. Eachbranchhasbeenlabeledaccordingto its type(A, S or C). The typesaredeterminedstartingfrom theterminalsandmoving upwardto theroot usingtheparity propertiesof eachnode.Whengrowing randomtreesor modifyingtrees,thetypesof thebrancheshave to becomputed,in orderto checkthatthetreeis valid (i.e. obeys thesyntacticrestrictions)andreturnstherequestedtype.

T1

T2

T3 T4

A

A

S

S

F1

F2

F3

A

+

*

*x

xb

a

C

C

Figure1: GPtreewith symmetricoutput

7

Page 9: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

2.5 The nodes,with their syntactic restrictions with respectto parity

In theconstructionof new randomtrees,andin thecross-overandmutationoperators,thepossiblecombinationof branchtypesallowedfor aspecifiedfunctionaregivenin thesyntacticrestrictiontables. Suchtablesmustbeprovidedfor eachfunction,andaregivenbelow for theoperatorswehave used. Most of the operatorstake two arguments.In the tablesbelow, the row andcolumnlabelscorrespondto the type of the two arguments,with the intersectionsproviding the type ofthe result. The symbol”–” representsa combinationof argumentswhich is not allowed. In thesyntacticrestrictiontables,operationsbetweenconstantsare disallowed, to avoid the wastefulcomputationof constantsthroughtheuseof cross-over andsub-treemutationoperators.Thiswillforbid treeslike

a & b(& x. Yet, this is not enoughto remove completelyredundantconstants,as

for examplein thetreea & b & x .Theelementaryalgebraicfunctionshavestraightforwardpropertieswith respectto theparitytrans-formation.Thesyntacticrestrictiontablefor theadditionandsubtractionoperatorsis

+ – A S CA A – –S – S SC – S –

andfor themultiplication( & ) anddivision ( ) ) operatorsit is

* / A S CA S A AS A S SC A S –

For thevolatility forecastmodels,moreoperatorsareusedin theconstructionof GPtrees.Wehavetakena functionsetcomposedof thesimplearithmeticoperations* ,

, &,+ andsomeoperators

commonlyusedin volatility processessuchasexponentialmoving averages,squareandabsolutevaluefunctions.

The exponentialmoving averageoperatorEMA - x. evaluatesby a simple iterative formula theaveragevalueof x on amoving sample

EMAt / µ EMA

t

δt 1 0 µ x t (2)

with µ exp

δt ) τ and0 0 µ 0 1. Therangeτ 0 of theEMA givesthelengthof themovingsample.A convenientparametrizationof the rangeτ is given by the logarithmof the rangez lnτ . Theuseof logarithmof thetimerangeis naturalin problemswhereτ canvarywidely, from

hoursto months.Moreover, for theGARCH(1,1)volatility model,z hasbeenshown to give anefficient parametrizationof themodelfor theoptimization,with the furtheradvantageof havingno domainrestriction[Zumbach,2000a]. Therefore,ourEMA nodetakesasfirst argumentz, andassecondargumentx. Thesyntactictablewith respectto parity is

8

Page 10: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

EMA A S CA – – –S A S –C A S –

wheretherow labelcorrespondsto thetypeof z, andthecolumnlabel is thetypeof thevariableargumentx to be averaged. As this operatoris an averagingoperator, the type of the result isalwaysthetypeof theargumentx. Theabsolutevalueandthesquareoperatorstake oneargument(A or S) andalwaysreturnasymmetrictype(S)value.

2.6 Taming complexity

Oneimportantproblemin geneticprogrammingis thecontrolof complexity. Complexity is notbadin itself asit canbe a reservoir for mutations.Yet, morecomplex treestake longerto eval-uate,andour Markov chainshave no problemin generatingmorecomplex treeswhenneeded.Therefore,to keeptheCPUtime low, complexity mustbetameddown. For thevolatility forecastinference,atoocomplex treecanalsooverfit thedataset,namelyasolutioncanfit particularlywellthesampleof datausedfor theGP optimization(in sample)but canfail on anothersample(outof sample).This overfitting problemis particularlyacutefor financialapplications,like volatilityforecast,as the dominantcharacterof the target function is random. The numberof constantsin theGPtreeis particularlycritical astoo many constantsimply a strongoverfitting probabilityanda very slow convergenceof thelocal optimizer(seenext section).For thesereasons,we haveadoptedanapproachwherewecorrectthescorefunctionto penalizemorecomplex GPtrees.

Thereis no naturalmeasureof thecomplexity of a GPtree,andseveraldefinitionscanbewrittendown. Wehave usedasimpleapproachusingthenumberof nodes(includingtheterminalnodes)andthenumberof constantsin a GPtree(themaximaldepthcanalsobeused).Thecomplexityof aGPtreeis definedas

complexity ∑k

wk expfk ) rk with ∑

k

wk 1 (3)

wherethesumrunsover thefactorsfk (numberof nodesandnumberof constants),wk is aweight,andrk therangeat which thecomplexity measuredby thefactork startsto increase.For therunsbelow, we have usedtheparametersw 0 2, r 20 for thenumberof nodes,andw 0 8, r 4for thenumberof constants.

For bothapplicationsbelow, thescoreis givenby anL2 distance,which is alwayspositive. Closeto thesolution,we needto enhancesmalldifferencesin thescorein orderto put a strongenoughselectionpressureon thebesttrees.A naturalsolutionis to selecttheGP treesaccordingto thelogarithmof thescore.With thepenalizationof complexity, we have usedafitnessgivenby

fitnessi logscorei λ complexity i (4)

wheretheindex i runsover theGPtreesin thepopulation.Theparameterλ fixestheimportanceof thecomplexity penalty. This parameterhasbeenoptimized(seesection.3), andvaluesin therangeλ 1 1 leadto anefficientalgorithm.

9

Page 11: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

For thegeneticevolution,treesareselectedusingafitnessproportionalalgorithm.Theprobabilityof selectinganindividual dependsthenon anoverall additive constantin thefitness,for exampleon a changeof normalizationin the L2 distanceusedin the score. In order to fix this additiveconstant,we use

pi max

fitnessi

C 0 (5)

wherepi is theprobabilityto selecttheindividual i. Theminussignoriginatesin thatthebestfitshave thelowestvalue,but needto beselectedwith thehighestprobability. TheconstantC is fixedby giving theprobabilityof selectinganindividual amonga givenbestfraction,for example,wewantto selectwith a probabilityof pb 50%anindividual amongthebestfractionbf 25%ofthepopulation.A simplecomputationgives

C ∑i 2 bf nfitnessi

pb∑i 2 n fitnessibf n

pbn(6)

with n thepopulationsize.Thismethodcanleadto negative probabilityfor theworstindividuals,andin suchcase,thenegative probabilitiesarereplacedwith 0. For thecomputationsbelow, wehave usedpb 50%andbf 25%,namelyhalf of thetime,a treein thebest25%is selected.

2.7 Combination with local search techniques

As is usuallydonein GP, thegeneticalgorithmalsooptimizestheconstantsincludedin aGPtreeby randommutations.From the algorithmicpoint of view, this optimizationof the constantsisvery inefficient. Moreover, for thevolatility forecastmodeling,theforecastperformancedependscritically on having goodvaluesfor theconstants.For thesimpleGARCH(1,1)model,thestan-dardoptimization(usingthederivatives)of themodelparametersis alreadya difficult probleminitself [Zumbach,2000a]. Whenexploringvariousmodels,theselectionbetweentwo goodvolatil-ity modelsis in factbasedon very smalldifferencesbetweenthescorevaluesanda smallchangein oneof theconstantsgenerallyaltersanearoptimumsolutionto theaverageones.

In orderto avoid suchproblemsandto retaintheoptimumsolutionscorrespondingto eachpos-sible structureof new volatility forecastmodels,we combinethe geneticprogrammingsearchwith a local optimizer. It must be notedthat an optimizationalgorithm using a gradient(likea simple conjugategradient)is far superiorto a randomsearchwhen optimizing a sufficientlysmoothfunctionin n. Therefore,theconstantsin ageneticprogramcanbeoptimizedby havingthegeneticprogramevaluationfunctioncall a local optimizationalgorithmthat searchesfor thebestconstants,andinsertsthesevaluesbackinto thetree.Thelocal optimizationalgorithmfindsthe bestconstants,startingfrom the valuesgiven in the original tree. The gradientof the costfunctionneedsto beevaluatednumerically, andwe usetheBFGSalgorithmfor the local search[Pressetal., 1986]. In this way, bothgeneticprogrammingandthe local optimizationalgorithmareusedin thedomainwherethey performbest,namelyGPto explore thestructureof thesolu-tion, andthelocaloptimizerto adjusttheparametersto theproblem.Thereadershouldbewarnedthatthefunctionover n generatedby randomtreescanhavequitewild shapes,andthatthelocaloptimizationalgorithmmustbevery robustto beableto gracefullyhandlethousandsof searches.In thenext section,we compareGPwith andwithout local optimizationof theconstants.

10

Page 12: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

3 Function fitting

We have testedour GP algorithm on the discovery of the transcendentalfunction cosx . The

“residualerror” is givenby

d2 43 2n 5 2nπ

0dxcosx gp

x 2 6 (7)

with gpx thevalueof a GPtreewith thesingleargumentx (i.e. theterminalnodescanhave the

valuex or aconstant).Theintegerparametern fixesthenumberof cosineperiodsto includein theintegral, andthenormalizationof theintegral is chosensothat for gp

x" 0, thecostfunctionis

1 (regardlessof n). For thegenomewithout syntacticrestrictions,namelywhennot imposingthesymmetryof parity, we have alsomeasuredtheconvergencepropertieswith the integral definedonthesymmetricinterval - 2nπ 2nπ . (with thenormalizationconstant1) n). Thecostfunctiontooptimizeby GPis the logarithmicresidualerror

score logd (8)

with log thebase10 logarithm(for thecomputationof thefitness,eq.4 includesthescoredirectlyandnot its logarithm). The nodesetincludesthe function+, * andsquare,so that the searchisdonein thesetof polynomialfunctions. Practically, theabove integral is discretizedwith a stepdx 1) 32.

Thediscovery of transcendentalfunctionsby GPis aninterestingtestbedfor severalreasons: It is a nontrivial task,andthedifficulty canbeincreased,in thepresentcaseby increasingthe numberof periodsn. The parity of the function to discover canbe given, in our casecosx is anevenfunctionof x. Therefore,theinfluenceof thesyntacticrestrictionscanbe

measured.Noticethatwith thesyntacticrestrictions,thefit is doneimplicitly on theinter-val

2nπ to 2nπ, whereaswithout restrictionsthefit is doneonly on the interval includedexplicitly in theabove integral,namelyfrom 0 to 2nπ or from

2nπ to 2nπ. Thecomputationaltimefor onesearchis smallenoughto allow usto performstatisticswith

variousalgorithms.Moreover, with goodaveragevalues,thealgorithmcanbe optimized.For example,we can usea geneticalgorithm to optimize the parametersof the Markovchains(mutationandcrossover) usedby thegeneticprogram. Whensearchingwithin the setof polynomialfunctions,the irreduciblepolynomialcorre-spondingto a GP treecanbe computed.This is interestingfor two reasons.First, the ir-reduciblepolynomialsprovide for equivalenceclassesof GPtrees,whereeachequivalenceclassis characterizedby themaximalorderof theTaylorexpansionof thecosfunction.Forthealgorithmwith local optimization,theirreduciblepolynomialcoefficientsareuniqueineachequivalentclasses(up to numericalconvergenceerror)andcorrespondto thebestap-proximationof thecosinefunctionby a polynomialof a givenorder. Second,a polynomialspacehasasimplemetric,whereasGPtreeshave noobviousmetric.As thealgorithmcon-vergesto thesolution,theirreduciblepolynomialmustconvergeto theTaylorexpansionofthechosenfunction,andthisdistancecanbemeasuredeasily.

11

Page 13: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

TheMarkov chainson the treespacehave beenoptimizedon this problem.For this purpose,anauxiliary costfunctionis set,with valuegivenby theaverageof thescoreeq.8 over 20 GPruns.The actualCPU time for the optimizationis alsomeasured,andthe final parameterschosensothat thecomputationaltime stayssmallaswell. A geneticalgorithmis thenusedto optimizetheprobabilitiesfor themutationandcross-over

p, asdefinedis sec.2.3.4, aswell astheweight for

thecomplexity λ in eq.4 Themain algorithmsto consideraretheGP with or without syntacticrestrictions,andwith or without local optimizationof the constants.The optimizationhasbeendoneon the 4 combinationsof syntacticrestrictionandlocal optimization. Essentially, the effi-ciency of theMarkov chainis changeddramaticallyby localoptimization,but notby thesyntacticrestrictions.

With local optimization,theprobabilityto mutatea constantis setto zero,andtheGPterminatesafter50 generations.Theresultsareasfollow: the“nodesubstitution”is anefficient transforma-tion; the“branchtypemutation”is efficient,whereasthe“subtreemutation”is not; the“root splic-ing” is counterproductive; the“nodeinsertion”isefficient; the“nodedeletion”shouldbeusedpar-simoniously;thecross-over probabilityis foundto befairly irrelevant.Quantitatively, agoodvec-torof probabilitiescharacterizingtheMarkov chainis

pMC 0 0 0 4 0 2 0 05 0 0 0 25 0 1 0 5 .

Thesevalueshave beenusedfor the comparisonbelow, aswell as for volatility forecastinginthe next section. Without local optimization,the probability to mutatea constantis alsoopti-mized,andtheGPrun terminatedafter5000generations.Theresultsaresimilar to thoseabove,but with the “constantleaf mutation” beingvery important. The vectorof probability is set to0.5 for the “constantleaf mutation”, and half of the valuesabove for the mutations,namelypMC 0 5 0 2 0 1 0 025 0 0 0 125 0 05 0 5 .Figure2 displaysthemeanlogarithmicresidualerrorasa functionof thenumberof generations.Theadvantageof thealgorithmwith local optimizationis striking (let usemphasizethatthehori-zontalaxis is logarithmic).Yet, this comparisonis unfair asthecomputationtime neededfor theoptimizationof theconstantsis fairly large. A moreobjective comparisonis to displaythemeanlogarithmicresidualerrorversusthemeancomputationaltime, asin figure3. The improvementprovidedby local optimizationis still huge,typically threeordersof magnitudeat constantcom-putationtime. Thesyntacticrestrictionsalsohelpto improve theconvergencespeed,but typicallyonly by half an orderof magnitude.Thereforethe advantageof the typing systemis partly thatit improvesGP convergence,but mainly that only valid solutionsaregenerated.A conservativeextrapolationof the computationaltime neededto reachan accuracy of log

d ' 4 for the al-

gorithmwithout local optimizationleadsto CPUtime of at leastoneyear. Let usemphasizethatonly this kind of qualitative improvementof thealgorithmmakestheproblemof volatility fore-castpossible.Clearly, eachalgorithmshouldbeusedwhereit is efficient: thegeneticprogramtoexplore thestructureof thesolution,andthe local optimizerto adjustthe tree’s constantsto theproblem.

We have alsodonesimilar testsfor n 3, namelythefit of cosx on 3 periods.As this problem

is modedifficult, thecomputationaltimesgrow, but the resultsarevery similar. Onenoticeabledifferenceis a long “latency” time at the beginning of the evolution, wherethe bestsolution isessentiallythe null function. This can be understoodas the numberof generationsneededtodiscover a polynomialwith a degreehigh enough(x6) to replicatetheoscillationof cosineover 3

12

Page 14: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

1 10 100 1000 10000

-4

-3

-2

-1

0

generation

loga

rithm

ic r

esid

ual e

rror

with local optimization with symmetric restriction no restriction, on 0 to 2π

no restriction, on -2π to 2π

no optimization with symmetric restriction no restriction, on 0 to 2π

no restriction, on -2π to 2π

Figure2: The mean logarithmic residual error versus the number of generations for different GPalgorithms. The mean is computed over 200 GP runs, each with a population of 100 genes and anelitism rate of 50%. The runs are terminated after 120 generations with local optimization and after15000 generations without local optimization.

13

Page 15: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

0.1 1 10 100 1000

-4

-3

-2

-1

0

average computational time [second]

loga

rithm

ic r

esid

ual e

rror

with local optimization with symmetric restriction no restriction, on 0 to 2π

no restriction, on -2π to 2π

no optimization with symmetric restriction no restriction, on 0 to 2π no restriction, on -2π to 2π

Figure3: The mean logarithmic residual error versus the mean computational time for different GPalgorithms. The GP parameters are as for fig. 2.

14

Page 16: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

periods.After this latency time, thesolutionsimprove rapidly.

Untill thispoint,we exploredthestatisticalpropertiesof GPalgorithms.Yet, it is alsointerestingto look in moredetailsat thespecificformsof thesolutionsdiscoveredby theGP. For theGPwithsyntacticrestrictionandlocal optimization,we performeda few runsover 200generationswith apopulationsizeof 100.Two examplesof solutionsdiscoveredby theGPare

1

0 5 & x & x x x x & x & 0 018 & x7& x & 0 0164& x& x x & x & 0 0168& x7& 0 0833& xand

0 405

x & 0 196

0 0118& x & x & 0 286

0 00252 & x2 2 & 2 47

x2 Forbothsolutions,thedistanceto thecos

x function(eq.7) isof theorderof 108 4 9 5, with maximal

degreex10 andx12 respectively.

4 Volatility ForecastingModels Infer ence

4.1 Intr oduction

In the literature,thereis a profusionof volatility forecastingmodels,but noneof themis abletoexplain all the known empirical factsobserved in high frequency foreign exchangetime series.In this section,we usethestronglytypedGPapproachto explore thespaceof possiblevolatilityforecastingmodels.With a GPapproachallowing for a broadsearchin thespaceof thepossibleforecasts,we have goodchancesof finding modelswith new structures.Our goalis to discover ifsomespecificstructureappearsin thebestgeneratedmodelsandthento infer whatis theclassoftheoptimumvolatility forecastingmodelsto beusedwith foreignexchangerates.

In thecaseof foreignexchangerates,thebasictime seriesis themiddlelogarithmicpricext - ln pbid

t ln

paskt :.") 2 0. Underthe interchangeof theperandexchangedcurrencies,the

pricesp aremappedinto 1) p, andthe logarithmicpricesinto x;

x. Themain time seriesofinterestis theannualizedpricechanges(or returns)r - δt . measuredon agiventime interval δt

r - δt . t " xt x

t

δt δt ) 1y

(9)

Thedenominatorannualizesthereturn,with 1y = 1 year. With this choiceE - r - δt .(. is essentiallyindependentof δt, with a typical valuearound10%for themainFX rates.

The volatility measuresthe fluctuationof the returnsr, andmustbe identicalfor a specificrateor for thecorrespondinginvertedrate.Accordingly, volatility modelsmustbeindependentof theinterchangeof theperandexchangedcurrencies,namelythey arerequiredto besymmetricwithrespectto thechanger

<r

σ

r ! σr (10)

Moreover, thevolatility is alwayspositive.

15

Page 17: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

4.2 Daily Volatility Forecast

In this study, thequantityto beforecastis the realizedmeandaily volatility σdt , which is con-

structedfrom our timeseriesof hourly returnsr - δt . asfollows

σ2dt ! σ2 - ∆t δt . t ! 1

24

24

∑k = 1

r2 - δt . t kδt (11)

whereδt is onehour and∆t is oneday. Othervaluescanbe chosenfor the parameters∆t andδt, but we have restrictedour empiricalstudy to the above values. For foreign exchangerates,statisticalstudieson observed price changesshow no autocorrelationof the returns(except forvery short time intervals), and no correlationbetweenthe returnand the volatility. Therefore,we restrictedour searchto modelswhich only predictthefuturevolatility, but do not predictanyfuturepricechange,namelywe searchfor forecastof σd, andnot for a joint forecastof σd andr.

ThegeneratedGPtreesdirectly provide at eachtime t theforecastvalueF2GP

t of themeandaily

varianceover the next 24 hoursσ2d

t , and the scorefunction usedto measurethe quality of a

forecastis theroot-meansquarederrors(RMSE)

score2 E - FGPt σd

t 2 .> (12)

Thefitnessiscomputedfrom thelogarithmof thescore,with apenaltyfor complexity, asdescribedin section2.6. The volatility forecastsare built on the information containedin the historicaldata,namelythepreviousreturns.As explainedin theintroduction,FX volatility modelsmustbeindependentof theinterchangeof theperandexchangedcurrencies,leadingto theexactsymmetryσ

r ! σr . In GP, syntacticrestrictionsareusedto imposethis symmetryon thesolution(see

section2.4). Toenforcepositivity, wehaveusedapenaltyapproachin whichall theGPtreeswhichreturnanegative valuefor thevariancearestronglypenalized.In theselectionmechanismfor thegenerationof the next populations,thesetreeshave a much lower probability for reproductionandthentendto disappear. We have useda functionsetrestrictedto simplearithmeticoperationsandoperatorscommonlyusedin volatility processsuchasexponentialmoving averages(EMA),squareandabsolutefunctions.To testtheimpactof heterogeneousmarket components,we haveintroducedterminalscorrespondingto returnmeasuredonvarioustime intervals.

4.3 Benchmark Models

ThegeneratedGPvolatility modelsarecomparedto differenttheoreticalvolatility modelswhicharein theframework of autoregressiveconditionalheteroskedastic(ARCH) models.Thesemodelsareconstructedfrom high frequency price changes,sampledat interval δt, andthey areusedtoforecastmeanvolatility ona longertime interval ∆t. In thisstudy, weconsiderhourly returnsδt =1 hour, andforecastmeanvolatility for daily interval ∆t = 1 day. Thegeneralform of thevolatilitymodelsunderconsiderationis

rt δt ; σm

t δt ε t δt

σ2mt δt ; σ2

m -Ω t θ . (13)

16

Page 18: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

whereε is an unknown randomi.i.d. (independentidentically distributed) process,Ωt is the

informationsetat time t which containsall previous returnandvolatility values,andθ aretheprocessparameters.Thetimescaleatwhich theprocessis evaluatedis δt.

At a giventime t, with theinformationsetΩt , thevolatility forecastfor thevolatility at t

kδt

is givenby theconditionalexpectation?σ2

m - kδt . t ! E - σ2mt

kδt @$ Ω t A.> (14)

Usingtheprocessequationsiteratively, theconditionalexpectationcanbeexpressedasa functionof thereturnandvolatility containedin theinformationsetΩ

t at time t (see[Zumbach,2000b]).

At time t, theforecastfor theaveragevolatility ona time interval from t to t ∆t is thengivenby?

σ2m - ∆t . t ! 1

n

n

∑k = 1

?σ2

m - kδt . t (15)

where∆t nδt. Theparametersof thebenchmarkmodelsareoptimizedby minimizing therootmeansquareerrorbetweentheforecastedandrealizedaveragedaily volatility

RMSE2 E - B ?σ2mt σd

t 2 .> (16)

Thefirst benchmarkis thepermanencehypothesis,namelythe forecastis givenby thehistoricalvolatility measuredonthelastdayσhist - 1d. or thelastweekσhist - 1w. . Thismodelhasnoadjustableparameter. Thesecondbenchmarkmodelweconsideris thewell known GARCH(1,1)model.Thevolatility processis givenby

σ2mt δt C α0

α1r2 t β1σ2mt (17)

To avoid recursionon thevolatility term,this modelcanberewritten usinga moving averageonthereturn,asfollows

σ2m

t δt D σ2 w

σ2

1t σ2 (18)

σ21t ; µσ2

1t

δt 1 µ r2 t with threeparametersσ, w andµ. For this model, the correlationdecaysexponentially, with acharacteristictime of µcorr exp

δt ) τcorr andµcorr α1

β1 µ

w1

µ .The third benchmarkis the FIGARCH model [Baillie et al., 1996], which incorporatethe longmemoryof the volatility througha fractional differenceoperator. The last type of benchmarkmodelcomesfrom thelong memoryHeterogeneousARCH classof models.TheHARCH modelwasdevelopedby [Muller et al., 1997] andis basedon many independentvolatility componentsevaluatedat differenttime horizons.Thevolatilities arecomputedfrom aggregatedreturnsmea-suredat time horizonsrangingfrom 1 hour to a few weeks. The model can be formulatedasfollows

σ2mt δt ; C0

n

∑j = 1

Cjσ2jt (19)

σ2jt E µjσ2

jt

δt 1 µj r2 - k jδt . t 17

Page 19: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

whereCj F 0 for j 0 n, andwith thenecessarystationaritycondition∑nj = 1Cj 0 1. Thek j

aretheaggregationfactorsof thereturnsandarechosenaccordingto ageometricprogression

k j p j 8 1 (20)

Thesamevalue p 4 asin [Muller et al., 1997] is chosenhere.Thevolatility memoryof equa-tion 20 is determinedby theconstantµj exp

δt ) τ j with τ j τ0k j . Thepresentformulation

differs in somedetailsfrom theoneusedin [Muller et al., 1997], in particularwe useannualizedreturns,leadingto simplerequations.

4.4 Experimentson two FX rates

Inferenceof volatility forecastingmodelshave beentestedon two foreign exchangerates,i.e.USD-CHFandUSD-JPY. For theseexperimentswe have used13 yearsof high frequency datafrom 1.1.1987to31.12.1999.In orderto removeseasonality, hourlyreturnsequallyspacedin busi-nesstime [Dacorognaet al., 1993, Breymannetal., 2000] areextractedfrom the high frequencydata.Theavailabledatasetsaredividedin threesubsamples.Datafrom 1.1.1987to 31.12.1989isusedfor theinitializationof themoving averageindicators,but not includedin thescorefunction.Thein-sampledatafrom 1.1.1990to 31.12.1994is usedto computethescorefunctionfor theGP,andto optimizetheparametersof thebenchmarksmodels.Theout-of-sampledatafrom 1.1.1995to 31.12.1999is usedfor comparisonof theperformancesof thevariousmodels.This is donetoobtainmodelsthatcapturethebeststructureof thevolatility forecast,andnot theparticulardatasetusedfor the optimization. The out-of-sampledatasetcontains5 yearsof hourly datawhichrepresentsmorethan31,200priceschanges,or 1300non-overlappingvolatility observations.

Wehave experimentedwith variousterminalsetsandfunctionsets.Oneterminalsetis composedonly of constantsandhourly returns.In orderto testtheimportanceof aggregatedreturns,anotherterminalset is composedof constantsandaggregatedreturnat selectedtime intervals kδt withk 1, 4, 24 (1 day),120(1 week)and480(1 month).Thebasicfunctionsetcontainstheaddition,multiplication and EMA operator. In a slightly extendedfunction set, we have addedthe theabsoluteandsquarevalueoperatorsin orderto generatemodelswhereσ2 cancontaintermsin $ r $ .The experimentreportedbelow is donefor both exchangeratesusingthe four combinationsofthe two terminal setsand two function sets. A populationof 100 individuals is evolved witha steadystategeneticalgorithmover 200 generations.At eachgeneration,the besthalf of thepopulationis kept, while the worst half is replaced(i.e. an elitism rateof 50%). The selectionof the individuals for reproductionis donewith the fitnessproportionalalgorithm,asdescribedin sec.2.6. Optimizationis alwaysdonewith local optimization,andthevectorof probabilitiescharacterizingtheMarkov chain

pMC is asgiven in sec.3. In onecase,thebestmodelgenerated

by theGPis not alwayspositive in theout-of-sample,andwe reporttheperformanceof thebestGPtreewith positive valuein theout-of-sample.

The performancefor the benchmarkmodelsandfor the bestGP treesarereportedin the tables1 and2. For theHARCH model,theconverge of the local searchalgorithmis problematic,andmostof thetimeit convergesto apointat thenecessarystationaritycondition∑n

j = 1Cj 0 1. For thismodel,thevaluesreportedin the tablearethebestresultsfrom 5 searchswith differentstarting

18

Page 20: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

process In-Sample Out-Of-Sample constants

σhist - 1d. 5.00 4.30 0σhist - 1w. 4.38 3.97 0

GARCH(1,1) 4.20 3.84 3FIGARCH 4.13 4.31 3HARCH(6) 4.02 4.02 7

Run1 3.98 3.76 6Run2 4.04 3.87 5Run3 4.00 3.74 9Run4 4.03 3.78 6

Table1: Comparisonof theRMSEvalues,in %, betweenthein-sampleandtheout-of-sampleofthebenchmarkprocesses,andthebestsolutionof eachGPrunsfor theUSD-CHFexchangerate.

points. Overall, the remarkableresult is that the GP solutionsare consistentlybetterthan thebenchmarks,includingout-of-sample.This clearlyshows thatGP, with thesyntacticrestrictionsand the local optimizationof the constants,is an efficient tool for discovering new forecastingmodels,withoutoverfittingthedatasets.Thenumberof constantsof thetreeis asgivenby thebestGPsolution.Someof theseconstantscanberedundant,andthenumberof independentconstantsmightbesmaller. Noticealsothatonly thebesttreefor eachrun is reported.Yet, thenext rankinggenomeshave alsovery goodscores,andcanbe considerablysimpler thanthe bestindividual.Thesesolutionsareindeedbetterat capturingtherelevantstructureof agoodforecastingmodel.

A detailedexaminationof thebestgeneratedtreesshows thefollowing salientfeatures: TheEMA functionis alwaysusedat leastoncein eachtree,andmostof thetime anEMAoperatoris therootnode.Clearly, agoodforecastneedsto haveenoughmemoryof thepast,andthis is achievedthroughtheuseof EMAs. TheEMAs have mostlya constantrangez, andnot a variablerangegiven by thevalueofa subtree.For example,subtreesof the kind EMA - 3 5; . areused,but subtreesof thekind EMA - 3 5 2 4 & r2; . seldomlyappear. Moreover, the typical valuesfor r2 is theannualizedvariance,which for thesedatasetsis of theorderof 0.01(i.e.

10% 2). For the

lastexample,thechangesin theEMA rangearethereforenumericallyvery small,anddonotplaysarolein theefficiency of thissubtree.Only in afew cases,anumericallyimportantvariableEMA rangeappearin thefinal population. All thegoodsolutionscontainproductof returnsatdifferenttimehorizons,for exampleliker - δt .G& r - 24δt . . Usingtherelationr - kδt . t H r -mδt . t r - k m δt . t mδt , thistermcanberewrittenasr - δt . t ,& r - δt . t r - δt . t !& r - 23δt . t δt , namelyas a squareterm, plus a crossterm of non overlappingreturns. Whenthe terminalsetcontainsonly the returnr - δt . at thehourly time horizonδt

19

Page 21: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

process In-Sample Out-Of-Sample constants

σhist - 1d. 4.51 6.19 0σhist - 1w. 3.98 5.96 0

GARCH(1,1) 3.81 5.62 3FIGARCH 3.78 5.51 3HARCH(6) 3.77 5.62 7

Run1 3.74 5.40 2Run2 3.61 5.48 4Run3 3.60 5.44 5Run4 3.59 5.42 7

Table2: Comparisonof theRMSEvalues,in %, betweenthein-sampleandtheout-of-sampleofthebenchmarkprocessesandthebestsolutionof eachGPrunsfor theUSD-JPYexchangerate.

(but no aggregatedreturns),similar termsaregeneratedthroughEMAs, like with r - δt .I&EMA - z; r - δt .. . Thesecrosstermsarethekey differencebetweentheGPsolutionsandall thebenchmarkmodelsthatcontainonly quadraticterms.Thereasonfor whichthesetermshavebeenomiteduntil now in volatility modelingis thatE - r - δt . t J& r -mδt . t mδt !.7 0. Yet,theGPdiscoveredthatthesetermsarecontainingvaluableinformationson theevolutionofthevolatility.

5 Conclusion

In this study, we have usedgeneticprogrammingwith syntacticrestrictionscombinedto localsearchtechniquesto explore the spaceof the volatility forecastingmodelsfor foreign exchangerates.Syntacticrestrictionsareusedto imposea specificsymmetrypropertyon thesolutionsandlocalsearchis usedto adjustthetree’sconstantsto theproblem.Wehavefirst testedthisapproachon thediscovery of thetranscendentalfunctioncos

x . This testshows thattheimprovementpro-

videdby local optimizationof thetree’s constantsis huge,typically threeordersof magnitudeonsuchproblems.Theadvantageof syntacticrestrictionsis partly that it improvesGPconvergence,but mainly thatonly solutionswith thecorrectsymmetrypropertyaregenerated.

The applicationto thediscovery of new typesof volatility forecastingmodelsfor financialtimeseriesis ahardproblemwhich is feasibleonly with localoptimizationof theconstants.To reduceoverfitting problemsandto keepoptimizationtime within reasonablebounds,a controlof theGPtrees’ complexity is needed.We have useda penalizationapproachto promotetreesof lowercomplexity with a reducednumberof constantterminals.

For the volatility forecastproblem,the experimenton two main exchangeratesshows that theselectedGP treesperform better than optimizedbenchmarkprocesses.Moreover, the out-of-sampleresultsindicatethat theseGP solutionsarequite robust. The successof the besttreesis

20

Page 22: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

dueto cross-productof two returnsat differenttimehorizons.Suchcrosstermsarepartly presentin theHARCH process,but with fixedweightsgivenby thecoefficientsCj asthismodelis indeedquadraticin r - kδt . . In GPtrees,thecombinationof thevariouscrosstermsis morecomplex andtheiroptimumweightingcanbedirectlyevaluated.Let usnoticethatthesecrosstermsareformedeitherby usingmoving averageof microscopicreturnsor by directly includingaggregatedreturns.The new crossproductsindicatequalitatively that the evolution of the volatility in the financialmarkets is sensitive to the pasttrend,wherefor exampler - δt . t A& r -mδt . t mδt 0, or to asidewaymarket,wherefor exampler - δt . t I& r -mδt . t mδt !0 0. Thisdependency of therealizedvolatility on the pastprice evolution is indeeda new stylized fact, whosediscovery shouldbecreditedto GP. This new effect is clearly worth a detailedstatisticalanalysis,which will be thesubjectof a forthcomingpublication.

21

Page 23: Genetic Programming with Syntactic Restrictions applied … · Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting Gilles Zumbach1, Olivier

References

[Baillie et al., 1996] Baillie, R. T., Bollerslev, T., andMikkelsen,H.-O. (1996). Fractionallyin-tegratedgeneralizedautoregressive conditionalheteroskedasticity. Journal of Econometrics,74(1):3–30.

[Bhattacharyyaetal., 1998] Bhattacharyya,S.,Pictet,O. V., andZumbach,G. (1998).Represen-tationalsemanticsfor geneticprogrammingbasedlearningin high-frequency financialdata.InKoza,J. R., Banzhaf,W., Chellapilla,K., Deb,K., Dorigo, M., Fogel,D. B., Garzon,M. H.,Goldberg, D. E., Iba,H., andRiolo,R.,editors,GeneticProgramming1998:Proceedingsof theThird AnnualConference, pages11–16,Universityof Wisconsin,Madison,Wisconsin,USA.MorganKaufmann.

[Breymannet al., 2000] Breymann, W., Zumbach,G., Dacorogna,M. M., and Muller, U. A.(2000). Dynamicaldeseasonalizationin otc andlocalizedexchange-tradedmarkets. InternaldocumentWAB.2000-01-31,Olsen& Associates,Seefeldstrasse233, 8008Zurich, Switzer-land.

[Chopardetal., 2000] Chopard,B., Pictet,O. V., andTomassini,M. (2000). Parallel anddis-tributedevolutionarycomputationfor financialapplications.Parallel AlgorithmsandApplica-tions, 15:15–36.

[Dacorognaetal., 1993] Dacorogna,M. M., Muller, U. A., Nagler, R. J.,Olsen,R. B., andPictet,O. V. (1993). A geographicalmodelfor thedaily andweeklyseasonalvolatility in theforeignexchangemarket. Journalof InternationalMoney andFinance, 12(4):413–438.

[Koza,1992] Koza,J. R. (1992). GeneticProgramming:On theProgrammingof Computers byMeansof Natural Selection. MIT Press,Cambridge,MA, USA.

[Montana,1995] Montana,D. J. (1995). Strongly typed geneticprogramming. EvolutionaryComputation, 3(2):199–230.

[Muller et al., 1997] Muller, U. A., Dacorogna,M. M., Dave, R. D., Olsen,R. B., Pictet,O. V.,and von Weizsacker, J. E. (1997). Volatilities of different time resolutions– analyzingthedynamicsof market components.Journalof Empirical Finance, 4(2-3):213–239.

[Pressetal., 1986] Press,W. H., Flannery, B. P., Teukolsky, S. A., andVetterling,W. T. (1986).Numericalrecipes.Theart of scientificcomputing. CambridgeUniversityPress,Cambridge.

[Zumbach,2000a] Zumbach,G. O. (2000a).Thepitfalls in fitting garchprocesses.In Dunis,C.,editor, Advancesin QuantitativeAssetManagement. KluverAcademicPublisher.

[Zumbach,2000b] Zumbach,G. O. (2000b). Volatility processandvolatility forecastwith longmemory. Olsen& Associatedinternalpaper.

22