Posts Tagged ‘mining’

How To Clean Price Data for Backtesting

December 16th, 2008 by jackieannpatterson | 2 Comments | Filed in Backtesting Set Up

Cleaning data for backtesting is not easy but its very necessary to get meaningful results.    Mis-adjusted price splits can skew the price data and mislead the unwary backtester into thinking they’re found the holy grail when the strategy merely happens to catch the good side of a bad gap.

Here’s the steps to screen out dirty data and produce a clean dataset:

1. Pick at least 3 candidate data vendors.

2. Format the data for comparison.

3. Write a program to do a smart comparison and run it on the 3 candidate data sets.

4. Analyze the mis-compares to see which set is in error.   if 2 of 3 sets agree, assume that’s the correct value and the outlier is wrong.

5. Send feedback to the data vendors so they can fix the errors.

6. Select the set of historical price data to use for backtesting and lock it down to prevent changes during the backtesting.

7. Feed the golden price data to the backtesting engine.

This process took me several weeks of work but was worth it to get accurate results.  There’s little point of going to the work of backtesting if the underlying data is riddled with errors.

Read on for details if you are going to attempt this on your own or if you just want to see what preparations go into serious backtesting. (more…)

Tags: , , , , , , , , , , ,

Curve-Fitting Definition

November 4th, 2008 by jackieannpatterson | No Comments | Filed in Glossary

 Curve-fitting in general is the process of finding the (mathematical) description which best matches a given set of data.    When its not applied to trading strategies, it can be a very useful way of drawing conclusions from experimental data.

 When applied to trading strategies, curve-fitting can produce over-optimized, over-optimistic results.   In any set of price data, there is some “magic”  combination of indicators and parameters that catches most every move and shows outstanding results.    Unfortunately, that magic formula is the result of chance and is different for every data set.   That means that future results probably won’t come close to the numbers generated with the full benefit of hindsight.

Extra Insight: 

There’s a fine line here.   On the one hand, we want to use backtesting to see how trading strategies performed in the past with an eye to picking the best one to trade.    On the other hand, we don’t want to trade a fantasy strategy that has little chance of working in the future.

I’m using the term curve-fitting as the negative connotation of over-optimization and data-mining as the positive connotation of selecting the best of many strategies via backtesting. 

Here are three things I do to help avoid the pitfalls of curve-fitting:

  • Out-of sample testing, e.g. test and compare results across multiple time periods.
  • Select parameters which fall in the middle of a range of good parameters.   Avoid the outlier settings that produce much better results than their neighbors.
  • Forward-test new trading strategies in live trading with small amounts before committing to full size trades.

See Technical Traders Guide to Computer Analysis of the Futures Marketsfor more against curve-fitting.

(Backtesting Blog is an Amazon Associate.)

Last updated 11/11/08.

Tags: , , , , , , , , ,

Out-of-Sample Testing Definition

October 22nd, 2008 by jackieannpatterson | No Comments | Filed in Glossary

Out-of-sample testing is a way to guard against curve-fitting.   Its a good practice because we don’t know how the market will go in the future. When we ultimately trade our strategy it will be on live data as it evolves, not on the historical price data used for backtesting.

Here’s how out-of-sample testing works:  First a backtest is performed on a given test period.    Then the same backtest is run on a new test period — a different sample of data, hence the name.     If the parameters or settings were over-optimized in the first backtest, its unlikely that they will perform well in the second time period.   

For example, its possible to tweak the parameters on just the right indicators to make over 1000% gains in backtesting.    But when we run those same settings in another period, it might actually lose money.   If it is custom fit to one set of data, it won’t work as well in a different set of data.  Much better to find that out with an additional backtesting run rather than live trading!

Extra Insight:

With two different time periods, the results are almost always going to be at least a little different.   

The most challenging situation is if the original sample is a bull market and the out-of-sample is a bearish period (or vice versa).

My backtesting reports are broken into distinctly different samples for exactly this reason.

To be completely effective, the out-of-sample data should only be used once.   Each backtest should have its own out-of-sample data because if it is used frequently, the out-of-sample data too easily becomes in-sample data.  Using Monte Carlo method is better in this respect.    See Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signalsfor more information.

(Backtesting Blog is an Amazon Associate.)

Updated 11/12/08.

Tags: , , , , , ,