To make apples-apples comparisons between runs, the test period needs to be exactly the same.
Most backtesting tools foster a lax sense of test periods. They bury the date contols, encouraging you to just use what comes up. By default, the tests go how-ever-far-back to today which means that the same test run today as yesterday will produce slightly different results. The error from this shifting window builds up over time, especially for extensive market testing that takes months to get through all the different runs.
I actually did a big round of testing 18 months ago and if I did not take steps to control my time period, I would now be running on only 25% of the original data! Fortunately, I did take the trouble to stake out specific test periods and essentially quarentine the data so that I can make valid comparisons from run to run.
Another reason for setting specific test periods is to avoid curve fitting and over-optimization by doing out-of-sample testing. That means running on two different time periods to make sure that good results in the first period are okay in the second time period and hence have a chance at persisting into the future.
Choice of test period strongly influences results as the market behavior differs. One way is to try to isolate time periods of rising, falling and sideways markets. Another method chooses time periods that include all three behaviors.
I started out with a ten-year window which includes all behaviors: May 1994 – Apr 2004. I coupled that with a three-year anti-curve-fitting window of May 2004 – Apr 2007. Lately, I’ve added a third window of May 2007 – May 2008.