Suppose we fit a curve with basis functions b1(X)=X, B2(X)=(X−1)2I(X≥1). Note that I(X≥1) equals 1 for X≥1 and 0 otherwise. We fit the linear regression model
Y=β0+β1b1(X)+β2b2(X)+ε
and obtain the coefficient estimates ˆβ0=1, ˆβ1=1, ˆβ2=−2. Sketch the estimated curve between X=−2 and X=2. Note the intercepts, slopes and other relevant information.
Suppose we fit a curve with basis functions b1(X)=I(0≤X≤2)−(X−1)I(1≤X≤2), B2(X)=(X−3)I(3≤X≤4)+I(4<X≤5). We fit the linear regression model
Y=β0+β1b1(X)+β2b2(X)+ε
and obtain the coefficient estimates ˆβ0=1, ˆβ1=1, ˆβ2=3. Sketch the estimated curve between X=−2 and X=2. Note the intercepts, slopes and other relevant information.
Explain what happens to the bias/variance trade-off of our model estimates use regression splines.
Draw an example (of your own invention) of a partition of two-dimensional feature space that could result from recursive binary splitting. Your example should contain at least six regions. Draw a decision tree corresponding to this partition. Be sure to label all aspects of your figures, including regions R1,R2,..., the cut points t1,t2,..., and so forth.
Provide a detailed explanation of the algorithm that is used to fit a regression tree.
Explain the difference between bagging, boosting, and random forests.
You will be using the Boston data found here. The response is medv
and the remaining variables are predictors.
Do test-training split as usual, and fit a random forest model or boosted tree (your choice) and a linear regression model.
The random forest or boosted tree model has a selection of hyper-parameters that you can tune to improve performance. Perform hyperparameter tuning using k-fold cross-validation to find a model with good predictive power. How does this model compare to the linear regression model?