For part (a) through (c) indicate which of the statements are correct. Justify your answers.
Suppose we estimate the regression coefficients in a linear regression model by minimizing
\[ \sum_{i=1}^n \left( y_i - \beta_0 - \sum^p_{j=1}\beta_j x_{ij} \right)^2 + \lambda \sum_{j=1}^p \beta_j^2 \]
for a particular value of \(\lambda\). For part (a) through (c) indicate which of the statements are correct. Justify your answers.
In this exercise, you are tasked to predict the weight of an animal in a zoo, based on which words are used to describe it. The animals
data set can be downloaded here.
This data set contains 1001 variables. The first variable weight
is the natural log of the mean weight of the animal. The remaining variables are named tf_*
which shows how many times the word *
appears in the description of the animal.
Fit a lasso regression model to predict weight
based on all the other variables.
Use the tune package to perform hyperparameter tuning to select the best value of \(\lambda\). Use 10 bootstraps as the resamples
data set.
How well does this model perform on the testing data set?