Cleaning data for backtesting is not easy but its very necessary to get meaningful results. Mis-adjusted price splits can skew the price data and mislead the unwary backtester into thinking they’re found the holy grail when the strategy merely happens to catch the good side of a bad gap.
Here’s the steps to screen out dirty data and produce a clean dataset:
1. Pick at least 3 candidate data vendors.
2. Format the data for comparison.
3. Write a program to do a smart comparison and run it on the 3 candidate data sets.
4. Analyze the mis-compares to see which set is in error. if 2 of 3 sets agree, assume that’s the correct value and the outlier is wrong.
5. Send feedback to the data vendors so they can fix the errors.
6. Select the set of historical price data to use for backtesting and lock it down to prevent changes during the backtesting.
7. Feed the golden price data to the backtesting engine.
This process took me several weeks of work but was worth it to get accurate results. There’s little point of going to the work of backtesting if the underlying data is riddled with errors.
Read on for details if you are going to attempt this on your own or if you just want to see what preparations go into serious backtesting. (more…)