Exercise 1 (7.5 points)
Review of k-fold cross-validation.
- Explain how k-fold cross-validation is implemented.
- What are the advantages and disadvantages of k-fold cross-validation relative to
- The validation set approach
- LOOCV
Exercise 2 (7.5 points)
Denote whether the following statements are true or false. Explain your reasoning.
- When \(k = n\) the cross-validation estimator is approximately unbiased for the true prediction error.
- When \(k = n\) the cross-validation estimator will always have a low variance.
- Statistical transformations on the predictors, such as scaling and centering, must be done inside each fold.
Exercise 3 (15 points)
This exercise should be answered using the Weekly
data set, which is part of the LSLR
package. If you don’t have it installed already you can install it with
To load the data set run the following code
- Create a test and training set using
initial_split()
. Split proportion is up to you. Remember to set a seed!
- Create a logistic regression specification using
logistic_reg()
. Set the engine to glm
.
- Create a 5-fold cross-validation object using the training data, and fit the resampled folds with
fit_resamples()
and Direction
as the response and the five lag variables plus Volume
as predictors. Remember to set a seed before creating k-fold object.
- Collect the performance metrics using
collect_metrics()
. Interpret.
- Fit the model on the whole training data set. Calculate the accuracy on the test set. How does this result compare to results in d. Interpret.