Data Mining Definition

Data-mining is the process of selecting the best of many strategies via backtesting.   Each strategy is tested across the same stocks and time periods.  Results are compared, and the best strategy from (proper) backtesting is likely to be the best strategy to trade live.   

Live trading performance numbers will vary as the market will not behave exactly like the historical price data used in the backtest, as well as other factors.  

Extra Insight:

Care must be taken to design the process to pick a robust strategy that will work under real-life market conditions going forward.   Limiting the number of degrees of freedom, e.g. the number of parameters, is one helpful tactic.    Another is to use advanced mathematics to gauge the statistical significance of the results.

David Aronson’s Evidence-Based Technical Analysis is an excellent reference for taking a scientific approach to data mining.  It also contains a chapter on estimating the statistical significance of trading strategies selected by data-mining.    Aronson says that data mining can identify the best strategy but, because results will have an upward ”data-mining bias”, they should not be used to estimate the performance of that strategy.   Thus only relative comparisons between strategies are possible.

I backtested a baseline strategy and use it as a reference for comparison.

(Backtesting Blog is an Amazon Associate.)

Last updated 11/11/08.

November 4th, 2008 Filed under Glossary

Tags: , , ,


Related posts:
  • Curve-Fitting Definition
  • Forward-Testing Definition
  • Test Period or Time Period Definition
  • My BackTesting Engine Evaluation in 2007
  • BackTesting Moving Averages
  • Share Your Thoughts