Assignment 4

Exercise 1 (7.5 points)

Review of k-fold cross-validation.

  1. Explain how k-fold cross-validation is implemented.
  2. What are the advantages and disadvantages of k-fold cross-validation relative to
    • The validation set approach
    • LOOCV

Exercise 2 (7.5 points)

Denote whether the following statements are true or false. Explain your reasoning.

  1. When \(k = n\) the cross-validation estimator is approximately unbiased for the true prediction error.
  2. When \(k = n\) the cross-validation estimator will always have a low variance.
  3. Statistical transformations on the predictors, such as scaling and centering, must be done inside each fold.

Exercise 3 (15 points)

This exercise should be answered using the Weekly data set, which is part of the LSLR package. If you don’t have it installed already you can install it with

To load the data set run the following code

library(ISLR)
data("Weekly")
  1. Create a test and training set using initial_split(). Split proportion is up to you. Remember to set a seed!
  2. Create a logistic regression specification using logistic_reg(). Set the engine to glm.
  3. Create a 5-fold cross-validation object using the training data, and fit the resampled folds with fit_resamples() and Direction as the response and the five lag variables plus Volume as predictors. Remember to set a seed before creating k-fold object.
  4. Collect the performance metrics using collect_metrics(). Interpret.
  5. Fit the model on the whole training data set. Calculate the accuracy on the test set. How does this result compare to results in d. Interpret.