Processing math: 100%

Assignment 5

Exercise 1 (10 points)

Suppose we fit a curve with basis functions $b_1(X) = X$ , $B_2(X) = (X - 1)^2 I(X \geq 1)$ . Note that $I(X \geq 1)$ equals 1 for $X \geq 1$ and 0 otherwise. We fit the linear regression model

$Y = \beta_0 + \beta_1b_1(X) + \beta_2b_2(X) + \varepsilon$

and obtain the coefficient estimates $\hat \beta_0 = 1$ , $\hat \beta_1 = 1$ , $\hat \beta_2 = -2$ . Sketch the estimated curve between $X = -2$ and $X = 2$ . Note the intercepts, slopes and other relevant information.

Exercise 2 (10 points)

Suppose we fit a curve with basis functions $b_1(X) = I(0 \leq X \leq 2) - (X-1)I(1 \leq X \leq 2)$ , $B_2(X) = (X - 3) I(3 \leq X \leq 4) + I(4 < X \leq 5)$ . We fit the linear regression model

$Y = \beta_0 + \beta_1b_1(X) + \beta_2b_2(X) + \varepsilon$

and obtain the coefficient estimates $\hat \beta_0 = 1$ , $\hat \beta_1 = 1$ , $\hat \beta_2 = 3$ . Sketch the estimated curve between $X = -2$ and $X = 2$ . Note the intercepts, slopes and other relevant information.

Exercise 3 (10 points)

Explain what happens to the bias/variance trade-off of our model estimates use regression splines.

Exercise 4 (10 points)

Draw an example (of your own invention) of a partition of two-dimensional feature space that could result from recursive binary splitting. Your example should contain at least six regions. Draw a decision tree corresponding to this partition. Be sure to label all aspects of your figures, including regions $R_1, R_2, ...$ , the cut points $t_1, t_2, ...$ , and so forth.

Exercise 5 (10 points)

Provide a detailed explanation of the algorithm that is used to fit a regression tree.

Exercise 6 (10 points)

Explain the difference between bagging, boosting, and random forests.

Exercise 7 (20 points)

You will be using the Boston data found here. The response is medv and the remaining variables are predictors.

Do test-training split as usual, and fit a random forest model or boosted tree (your choice) and a linear regression model.

The random forest or boosted tree model has a selection of hyper-parameters that you can tune to improve performance. Perform hyperparameter tuning using k-fold cross-validation to find a model with good predictive power. How does this model compare to the linear regression model?

Assignment 5

Authors

Affiliations

Published

DOI