[quantstart] lesson 3# successful backtesting of algorithmic trading strategies (part i).pdf
TRANSCRIPT
Subject: [QuantStart] Lesson 3: Successful Backtesting of Algorithmic Trading Strategies (Part I)
From: Michael HallsMoore ([email protected])
To: [email protected];
Date: Sunday, 4 January 2015, 6:18
Hi Darek,
In the second lesson of our quantitative finance email course we discussed how to identifyalgorithmic trading strategies and, in particular, create a strategy pipeline.
In today's lesson we're going to dig deeper into backtesting and consider aspects that are oftenignored by much of the algorithmic trading literature.
Algorithmic backtesting requires knowledge of many areas, including psychology, mathematics,statistics, software development and market/exchange microstructure.
I couldn't hope to cover all of those topics in one email, so I'm going to split them into two or threesmaller pieces. What will we discuss in this section? I'll begin by defining backtesting and then I willdescribe the basics of how it is carried out.
Then I will elucidate upon the biases we touched upon in the first email (Beginner's Guide toQuantitative Trading). Next I will present a comparison of the various available backtesting softwareoptions.
In subsequent emails we will look at the details of strategy implementations that are often barelymentioned or ignored.
We will also consider how to make the backtesting process more realistic by including theidiosyncrasies of a trading exchange. Then we will discuss transaction costs and how to correctlymodel them in a backtest setting.
Let's begin by discussing what backtesting is and why we should carry it out in our algorithmictrading.
What is Backtesting?Algorithmic trading stands apart from other types of investment classes because we can more reliablyprovide expectations about future performance from past performance, as a consequence ofabundant data availability. The process by which this is carried out is known as backtesting.
In simple terms, backtesting is carried out by exposing your particular strategy algorithm to a streamof historical financial data, which leads to a set of trading signals.
Each trade, which we will mean here to be a 'roundtrip' of two signals, will have an associated profitor loss. The accumulation of this profit/loss over the duration of your strategy backtest will lead to thetotal profit and loss (also known as the 'P&L' or 'PnL'). That is the essence of the idea, although ofcourse the "devil is always in the details"!
What are key reasons for backtesting an algorithmic strategy?
Filtration If you recall from the previous email Strategy Identification, our goal at the initialresearch stage was to set up a strategy pipeline and then filter out any strategy that did notmeet certain criteria. Backtesting provides us with another filtration mechanism, as we caneliminate strategies that do not meet our performance needs.
Modelling Backtesting allows us to (safely!) test new models of certain market phenomena,such as transaction costs, order routing, latency, liquidity or other market microstructureissues.
Optimisation Although strategy optimisation is fraught with biases, backtesting allows us toincrease the performance of a strategy by modifying the quantity or values of the parametersassociated with that strategy and recalculating its performance.
Verification Our strategies are often sourced externally, via our strategy pipeline.Backtesting a strategy ensures that it has not been incorrectly implemented. Although we willrarely have access to the signals generated by external strategies, we will often have accessto the performance metrics such as the Sharpe Ratio and Drawdown characteristics. Thus wecan compare them with our own implementation.
Backtesting provides a host of advantages for algorithmic trading. However, it is not alwayspossible to straightforwardly backtest a strategy.
In general, as the frequency of the strategy increases, it becomes harder to correctly model themicrostructure effects of the market and exchanges.
This leads to less reliable backtests and thus a trickier evaluation of a chosen strategy. This is aparticular problem where the execution system is the key to the strategy performance, as with ultrahigh frequency algorithms.
Unfortunately, backtesting is fraught with biases of all types. We have touched upon some of theseissues in previous emails, but we will now discuss them in depth.
Biases Affecting Strategy BacktestsThere are many biases that can affect the performance of a backtested strategy.
Unfortunately, these biases have a tendency to inflate the performance rather than detract from it.Thus you should always consider a backtest to be an idealised upper bound on the actualperformance of the strategy. It is almost impossible to eliminate biases from algorithmic trading so it isour job to minimise them as best we can in order to make informed decisions about our algorithmicstrategies.
There are four major biases that I wish to discuss: Optimisation Bias, LookAhead Bias, SurvivorshipBias and Psychological Tolerance Bias.
Optimisation Bias
This is probably the most insidious of all backtest biases.
It involves adjusting or introducing additional trading parameters until the strategy performance onthe backtest data set is very attractive. However, once live the performance of the strategy can bemarkedly different. Another name for this bias is "curve fitting" or "datasnooping bias".
Optimisation bias is hard to eliminate as algorithmic strategies often involve many parameters.
"Parameters" in this instance might be the entry/exit criteria, lookback periods, averaging periods (i.ethe moving average smoothing parameter) or volatility measurement frequency.
Optimisation bias can be minimised by keeping the number of parameters to a minimum andincreasing the quantity of data points in the training set.
In fact, one must also be careful of the latter as older training points can be subject to a prior regime(such as a regulatory environment) and thus may not be relevant to your current strategy.
One method to help mitigate this bias is to perform a sensitivity analysis. This means varying theparameters incrementally and plotting a "surface" of performance.
Sound, fundamental reasoning for parameter choices should, with all other factors considered, leadto a smoother parameter surface.
If you have a very jumpy performance surface, it often means that a parameter is not reflecting aphenomena and is an artefact of the test data.
There is a vast literature on multidimensional optimisation algorithms and it is a highly active area ofresearch. I won't dwell on it here, but keep it in the back of your mind when you find a strategy with afantastic backtest!
LookAhead Bias
Lookahead bias is introduced into a backtesting system when future data is accidentally includedat a point in the simulation where that data would not have actually been available.
If we are running the backtest chronologically and we reach time point N, then lookahead biasoccurs if data is included for any point N+k, where k>0. Lookahead bias errors can be incrediblysubtle. Here are three examples of how lookahead bias can be introduced:
Technical Bugs Arrays/vectors in code often have iterators or index variables. Incorrectoffsets of these indices can lead to a lookahead bias by incorporating data at N+k for nonzero k.
Parameter Calculation Another common example of lookahead bias occurs whencalculating optimal strategy parameters, such as with linear regressions between two timeseries. If the whole data set (including future data) is used to calculate the regressioncoefficients, and thus retroactively applied to a trading strategy for optimisation purposes, thenfuture data is being incorporated and a lookahead bias exists.
Maxima/Minima Certain trading strategies make use of extreme values in any time period,such as incorporating the high or low prices in OHLC data. However, since thesemaximal/minimal values can only be calculated at the end of a time period, a lookahead biasis introduced if these values are used during the current period. It is always necessary to laghigh/low values by at least one period in any trading strategy making use of them.
As with optimisation bias, one must be extremely careful to avoid its introduction. It is often the mainreason why trading strategies underperform their backtests significantly in "live trading".
Survivorship Bias
Survivorship bias is a particularly dangerous phenomenon and can lead to significantly inflatedperformance for certain strategy types.
It occurs when strategies are tested on datasets that do not include the full universe of prior assetsthat may have been chosen at a particular point in time, but only consider those that have "survived"to the current time.
As an example, consider testing a strategy on a random selection of equities before and after the2001 market crash. Some technology stocks went bankrupt, while others managed to stay afloat andeven prospered.
If we had restricted this strategy only to stocks which made it through the market drawdown period,we would be introducing a survivorship bias because they have already demonstrated their successto us.
In fact, this is just another specific case of lookahead bias, as future information is being incorporatedinto past analysis.
There are two main ways to mitigate survivorship bias in your strategy backtests:
Survivorship Bias Free Datasets In the case of equity data it is possible to purchasedatasets that include delisted entities, although they are not cheap and only tend to be utilisedby institutional firms. In particular, Yahoo Finance data is NOT survivorship bias free, and thisis commonly used by many retail algo traders. One can also trade on asset classes that arenot prone to survivorship bias, such as certain commodities (and their future derivatives).
Use More Recent Data In the case of equities, utilising a more recent data set mitigates thepossibility that the stock selection chosen is weighted to "survivors", simply as there is lesslikelihood of overall stock delisting in shorter time periods. One can also start building apersonal survivorshipbias free dataset by collecting data from current point onward. After 34years, you will have a solid survivorshipbias free set of equities data with which to backtestfurther strategies.
We will now consider certain psychological phenomena that can influence your trading performance.
Psychological Tolerance Bias
This particular phenomena is not often discussed in the context of quantitative trading. However, it isdiscussed extensively in regard to more discretionary trading methods.
It has various names, but I've decided to call it "psychological tolerance bias" because it captures theessence of the problem.
When creating backtests over a period of 5 years or more, it is easy to look at an upwardly trendingequity curve, calculate the compounded annual return, Sharpe ratio and even drawdowncharacteristics and be satisfied with the results.
As an example, the strategy might possess a maximum relative drawdown of 25% and a maximumdrawdown duration of 4 months. This would not be atypical for a momentum strategy.
It is straightforward to convince oneself that it is easy to tolerate such periods of losses because theoverall picture is rosy. However, in practice, it is far harder!
If historical drawdowns of 25% or more occur in the backtests, then in all likelihood you will seeperiods of similar drawdown in live trading.
These periods of drawdown are psychologically difficult to endure. I have observed first hand what anextended drawdown can be like, in an institutional setting, and it is not pleasant even if thebacktests suggest such periods will occur.
The reason I have termed it a "bias" is that often a strategy which would otherwise be successful isstopped from trading during times of extended drawdown and thus will lead to significantunderperformance compared to a backtest.
Thus, even though the strategy is algorithmic in nature, psychological factors can still have a heavyinfluence on profitability.
The takeaway is to ensure that if you see drawdowns of a certain percentage and duration in thebacktests, then you should expect them to occur in live trading environments, and will need topersevere in order to reach profitability once more.
Software Packages for BacktestingThe software landscape for strategy backtesting is vast.
Solutions range from fullyintegrated institutional grade sophisticated software through toprogramming languages such as C++, Python and R where nearly everything must be written fromscratch (or suitable 'plugins' obtained).
As quant traders, we are interested in the balance of being able to "own" our trading technology stackversus the speed and reliability of our development methodology. Here are the key considerations forsoftware choice:
Programming Skill The choice of environment will in a large part come down to your abilityto program software. I would argue that being in control of the total stack will have a greatereffect on your long term P&L than outsourcing as much as possible to vendor software. This isdue to the downside risk of having external bugs or idiosyncrasies that you are unable to fix invendor software, which would otherwise be easily remedied if you had more control over your"tech stack". You also want an environment that strikes the right balance between productivity,library availability and speed of execution. I make my own personal recommendation below.
Execution Capability/Broker Interaction Certain backtesting software, such asTradeStation, ties in directly with a brokerage. I am not a fan of this approach as reducingtransaction costs are often a big component of getting a higher Sharpe ratio. If you're tied intoa particular broker (and Tradestation "forces" you to do this), then you will have a harder timetransitioning to new software (or a new broker) if the need arises. Interactive Brokers providean API which is robust, albeit with a slightly obtuse interface.
Customisation An environment like MATLAB or Python gives you a great deal of flexibilitywhen creating algo strategies as they provide fantastic libraries for nearly any mathematicaloperation imaginable, but also allow extensive customisation where necessary.
Strategy Complexity Certain software just isn't cut out for heavy number crunching ormathematical complexity. Excel is one such piece of software. While it is good for simplerstrategies, it cannot really cope with numerous assets or more complicated algorithms, atspeed.
Bias Minimisation Does a particular piece of software or data lend itself more to tradingbiases? You need to make sure that if you want to create all the functionality yourself, that youdon't introduce bugs which can lead to biases.
Speed of Development One shouldn't have to spend months and months implementing abacktest engine. Prototyping should only take a few weeks. Make sure that your software isnot hindering your progress to any great extent, just to grab a few extra percentage points ofexecution speed. C++ is the "elephant in the room" here!
Speed of Execution If your strategy is completely dependent upon execution timeliness (asin HFT/UHFT) then a language such as C or C++ will be necessary. However, you will beverging on Linux kernel optimisation and FPGA usage for these domains, which is outside thescope of this email!
Cost Many of the software environments that you can program algorithmic trading strategieswith are completely free and open source. In fact, many hedge funds make use of opensource software for their entire algo trading stacks. In addition, Excel and MATLAB are bothrelatively cheap and there are even free alternatives to each.
Now that we have listed the criteria with which we need to choose our software infrastructure, I wantto run through some of the more popular packages and how they compare:
Note: I am only going to include software that is available to most retail practitioners and softwaredevelopers, as this is the readership of the site and the email list. While other software is available,such as the more institutional grade tools, I feel these are too expensive to be effectively used in aretail setting and I personally have no extensive experience with them.
Backtesting Software Comparison
MSExcel
Description: WYSIWYG (whatyouseeiswhatyouget)spreadsheet software. Extremely widespread in the financialindustry. Data and algorithm are tightly coupled.Execution: Yes, Excel can be tied into most brokerages.Customisation: VBA macros allow more advanced functionality atthe expense of hiding implementation.Strategy Complexity: More advanced statistical tools are harderto implement as are strategies with many hundreds of assets.Bias Minimisation: Lookahead bias is easy to detect via cellhighlighting functionality (assuming no VBA).Development Speed: Quick to implement basic strategies.Execution Speed: Slow execution speed suitable only for lowerfrequency strategies.Cost: Cheap or free (depending upon license).Alternatives: OpenOffice
MATLAB
Description: Programming environment originally designed forcomputational mathematics, physics and engineering. Very wellsuited to vectorised operations and those involving numerical linearalgebra. Provides a wide array of plugins for quant trading. Inwidespread use in quantitative hedge funds.Execution: No native execution capability, MATLAB requires aseparate execution system.Customisation: Huge array of community plugins for nearly allareas of computational mathematics.Strategy Complexity: Many advanced statistical methods alreadyavailable and welltested.Bias Minimisation: Harder to detect lookahead bias, requiresextensive testing.Development Speed: Short scripts can create sophisticatedbacktests easily.Execution Speed: Assuming a vectorised/parallelised algorithm,MATLAB is highly optimised. Poor for traditional iterated loops.Cost: ~1,000 USD for a license.Alternatives: Octave, SciLab
Description: Highlevel language designed for speed ofdevelopment. Wide array of libraries for nearly any programmatictask imaginable. Gaining wider acceptance in hedge fund andinvestment bank community. Not quite as fast as C/C++ forexecution speed.Execution: Python plugins exist for larger brokers, such asInteractive Brokers. Hence backtest and execution system can allbe part of the same "tech stack".Customisation: Python has a very healthy developmentcommunity and is a mature language. NumPy/SciPy provide fast
Pythonscientific computing and statistical analysis tools relevant for quanttrading.Strategy Complexity: Many plugins exist for the main algorithms,but not quite as big a quant community as exists for MATLAB.Bias Minimisation: Same bias minimisation problems exist as forany high level language. Need to be extremely careful abouttesting.Development Speed: Python's main advantage is developmentspeed, with robust in built in testing capabilities.Execution Speed: Not quite as fast as C++, but scientificcomputing components are optimised and Python can talk to nativeC code with certain plugins.Cost: Free/Open SourceAlternatives: Ruby, Erlang, Haskell
R
Description: Environment designed for advanced statisticalmethods and time series analysis. Wide array of specific statistical,econometric and native graphing toolsets. Large developercommunity.Execution: R possesses plugins to some brokers, in particularInteractive Brokers. Thus an endtoend system can written entirelyin R.Customisation: R can be customised with any package, but itsstrengths lie in statistical/econometric domains.Strategy Complexity: Mostly useful if performing econometric,statistical or machinelearning strategies due to available plugins.Bias Minimisation: Similar level of bias possibility for any highlevel language such as Python or C++. Thus testing must becarried out.Development Speed: R is rapid for writing strategies based onstatistical methods.Execution Speed: R is slower than C++, but remains relativelyoptimised for vectorised operations (as with MATLAB).Cost: Free/Open SourceAlternatives: Stata
C++
Description: Mature, highlevel language designed for speed ofexecution. Wide array of quantitative finance and numericallibraries. Harder to debug and often takes longer to implement thanPython or MATLAB. Extremely prevalent in both the buy and sellside.Execution: Most brokerage APIs are written in C++ and Java.Thus many plugins exist.Customisation: C/C++ allows direct access to underlying memory,hence ultrahigh frequency strategies can be implemented.Strategy Complexity: C++ STL provides wide array of optimisedalgorithms. Nearly any specialised mathematical algorithmpossesses a free, opensource C/C++ implementation on the web.Bias Minimisation: Lookahead bias can be tricky to eliminate, butno harder than other highlevel language. Good debugging tools,but one must be careful when dealing with underlying memory.Development Speed: C++ is quite verbose compared to Python orMATLAB for the same algorithmm. More linesofcode (LOC) often
leads to greater likelihood of bugs.Execution Speed: C/C++ has extremely fast execution speed andcan be well optimised for specific computational architectures. Thisis the main reason to utilise it.Cost: Various compilers: Linux/GCC is free, MS Visual Studio hasdiffering licenses.Alternatives: C#, Java, Scala
Different strategies will require different software packages. HFT and UHFT strategies will bewritten in C/C++ (these days they are often carried out on GPUs and FPGAs), whereas lowfrequencydirectional equity strategies are easy to implement in TradeStation, due to the "all in one" nature ofthe software/brokerage.
My personal preference is for Python as it provides the right degree of customisation, speed ofdevelopment, testing capability and execution speed for my needs and strategies.
If I need anything faster, I can "drop in" to C++ directly from my Python programs. One methodfavoured by many quant traders is to prototype their strategies in Python and then convert the slowerexecution sections to C++ in an iterative manner. Eventually the entire algo is written in C++ and canbe "left alone to trade"!
In the next email we will take an extensive look at transaction cost modelling as well as strategyspecific implementation.
Thanks for reading,
Mike.
QuarkGluon Ltd, Flat 481, Imperial Court, 225 Kennington Lane, London, Greater London SE11 5QN, UNITEDKINGDOMUnsubscribe | Change Subscriber Options