Review of k-fold cross-validation.
Denote whether the following statements are true or false. Explain your reasoning.
This exercise should be answered using the Weekly
data set, which is part of the LSLR
package. If you don’t have it installed already you can install it with
install.packages("ISLR")
To load the data set run the following code
initial_split()
. Split proportion is up to you. Remember to set a seed!logistic_reg()
. Set the engine to glm
.fit_resamples()
and Direction
as the response and the five lag variables plus Volume
as predictors. Remember to set a seed before creating k-fold object.collect_metrics()
. Interpret.We will now derive the probability that a given observation is part of a bootstrap sample. Suppose that we obtain a bootstrap sample from a set of \(n\) observations.
Comment on the results obtained.
Suppose that we use some statistical learning method to make a prediction for the response \(Y\) for a particular value of the predictor \(X\).
This exercise should be answered using the Default
data set, which is part of the LSLR
package. If you don’t have it installed already you can install it with
install.packages("ISLR")
To load the data set run the following code
parsnip
package to fit a logistic regression on the default
data set. default
is the response and income
and balance
are the predictors. Then use summary()
on the fitted model to determine the estimated standard errors for the coefficients associated with income
and balance
. Comment on the estimated standard errors.bootstraps()
function from the rsample
package to generate 25 bootstraps of Default
.boots
to the name of the bootstrapping object created in the previous question. This will take a minute or two to run. Comment on# This function takes a `bootstrapped` split object, and fits a logistic model
fit_lr_on_bootstrap <- function(split) {
logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification") %>%
fit(default ~ income + balance, analysis(split))
}
# This code uses `map()` to run the model inside each split. Then it used
# `tidy()` to extract the model estimates the parameter
boot_models <-
boots %>%
mutate(model = map(splits, fit_lr_on_bootstrap),
coef_info = map(model, tidy))
# This code extract the estimates for each model that was fit
boot_coefs <-
boot_models %>%
unnest(coef_info)
# This code calculates the standard deviation of the estimate
sd_estimates <- boot_coefs %>%
group_by(term) %>%
summarise(std.error_est = sd(estimate))
sd_estimates
summary()
function on the first model and the estimated standard errors you found using the above code.