We have seen previously how learning the parameters of a prediction function on the same data would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice to hold out part of the available data as a test set.
The test error would be then the average error that results when predicting the response on a new observation, one that was not used in training the learning method.
In contrast, the training error is calculated by applying the learning method to the observations used in its training.
But the training error rate often is quite different from the test error rate, and in particular it can dramatically underestimate the general error.
The best solution would be to use a large designated test set, that often is not available.
Here we see a class of methods that estimate the test error by holding out a subset of the training observations from the fitting process, and then applying the learning method to those held out observations. Continue reading “Cross-validation”