where 1(.) is an indicator function that takes the value one if and only if the expression within brackets is true. White (2000) applies the Reality Check to a specification search directed toward forecasting the daily returns of the S&P 500 one day in advance in the period May 29, 1988 through May 31, 1994 (the period May 29, 1988 through June 3, 1991 is used as initialization period). In the specification search linear forecasting models that make use of technical indicators, such as momentum, local trend, relative strength indexes and moving averages, are applied to the data set. The mean squared prediction error and directional accuracy are used as prediction measures. White (2000) shows that the Reality Check does not reject the null hypothesis that the best technical indicator model cannot beat the buy-and-hold benchmark. However, if one looks at the p-value of the best strategy not corrected for the specification search, the so called data-mined p-value, the null is not rejected marginally in the case of the mean squared prediction error accuracy, and is rejected in the case of directional accuracy.
Sullivan, Timmermann and White (1999, 2001) utilize the RC to evaluate simple technical trading strategies and calendar effects applied to the Dow-Jones Industrial Average (DJIA) in the period 1897-1996. As performance measures the mean return and the Sharpe ratio are chosen. The benchmark is the buy-and-hold strategy. Sullivan et al. (1999) find for both performance measures that the best technical trading rule has superior forecasting power over the buy-and-hold benchmark in the period 1897-1986 and for several subperiods, while accounting for the effects of data snooping. Thus it is found that the earlier results of Brock et al. (1992) survive the danger of data snooping. However for the period 1986-1996 this result is not repeated. The individual data-mined p-values still reject the null hypothesis, but the RC p-values do not reject the null hypothesis anymore. For the calendar effects (Sullivan et al., 2001) it is found that the individual data-mined p-values do reject the null hypothesis in the period 1897-1996, while the RC, which corrects for the search of the best model, does not reject the null hypothesis of no superior forecasting power of the best model over the buy-and-hold benchmark. Hence Sullivan et al. (1999, 2001) show that if one does not correct for data snooping one can make wrong inferences about the significant forecasting power of the best model.
Hansen (2001) identifies a similarity condition for asymptotic tests of composite hypotheses and shows that this condition is a necessary condition for a test to be unbiased. The similarity condition used is called ``asymptotic similarity on the boundary of a null hypothesis'' and Hansen (2001) shows that White's RC does not satisfy this condition. This causes the RC to be a biased test, which yields inconsistent p-values. Further the RC is sensitive to the inclusion of poor and irrelevant models, because the p-value can be increased by including poor models. The RC is therefore a subjective test, because the