Processing math: 0%
+ - 0:00:00
Notes for current slide
Notes for next slide

Tree-Based Methods

AU STAT627

Emil Hvitfeldt

2021-06-16

1 / 35
`\require{color}\definecolor{orange}{rgb}{1, 0.603921568627451, 0.301960784313725} \require{color}\definecolor{blue}{rgb}{0.301960784313725, 0.580392156862745, 1} \require{color}\definecolor{pink}{rgb}{0.976470588235294, 0.301960784313725, 1}`

Overview

We will cover 4 new methods today

  • Decision Trees
  • Bagging
  • Random Forrest
  • boosting
2 / 35

Overview

We will cover 4 new methods today

  • Decision Trees
  • Bagging
  • Random Forrest
  • boosting

Decision trees act as the building block for this chapter

3 / 35

Decision Trees

Given a problem, give me flow chart of if-else statements to find the answer

4 / 35

Penguins

5 / 35

Penguins

6 / 35

The flowchart

7 / 35

The rules

## ..y Ade Chi Gen
## Adelie [.97 .03 .00] when flipper_length_mm < 207 & bill_length_mm < 43
## Chinstrap [.06 .92 .02] when flipper_length_mm < 207 & bill_length_mm >= 43
## Gentoo [.02 .04 .95] when flipper_length_mm >= 207
8 / 35

General setup

  • We divide the predictor space into multiple non-overlapping regions ( R_1, R_2, ..., R_J ).
  • Every observation that falls into a region will have the same prediction, and that prediction will be based on the observations in that region
    • Regression: mean value
    • Classification: Most common value
9 / 35

General setup

The shapes could in theory be any shape, but for simplicity we are using rectangles/boxes to partition the space

The main goal is to build a partition that minimizes some loss such as RSS

\sum_{j=1}^J \sum_{i \in R_j} \left(y_i - \hat y_{R_j} \right)^2

10 / 35

General setup

It is generally computational unfeasible to calculate all possible partitions

We use a recursive binary splitting procedure to find the trees

This approach is top-down approach since we start at the top and split our way down

It is greedy because we select the best possible split each time

11 / 35

Details

How many times should we split?

If we continue to split we end up with each observation belonging to their own region, giving us a wildly flexible model

We can control a number of different things, simple ones are

  • Tree depth, maximum depth of the tree
  • minimum number of data points in a node that are required for the node to be split further
12 / 35

Tree Pruning

Due to the way decision trees are grown, it can be beneficial to grow larger trees and then go back and reduce the complexity of the tree after

13 / 35

Regression "curves"

14 / 35

Regression "curves"

15 / 35

Regression "curves"

16 / 35

Regression "curves"

17 / 35

Regression "curves"

18 / 35

Regression "curves"

19 / 35

Decision boundary

20 / 35

Decision boundary

21 / 35

Decision boundary

22 / 35

Decision boundary

23 / 35

Pros and Cons

Pros

  • Very easy to explain and reason about
  • Can Handle qualitative predictors without the need for dummy variables

Cons

  • Don't have great predictive power
  • Non-robust, small changes in the data can give wildly different models
24 / 35

Next Steps

Individual decision trees don't offer great predictive performance due to their simple nature

Bagging, Random Forests and Boosting uses multiple decision trees together to get better performance with a trade-off of more complexity

25 / 35

Bagging

Decision trees suffer for high variance

We saw in week 3 how bootstrapping could be used to reduce the variance of a statistical learning method

We will use bootstrapping again with decision trees to reduce the variance. We can feasible do this since individual decision trees are fast to train

26 / 35

Bagging

"Algorithm"

  • Generate B different bootstrapped training data sets
  • Fit a decision tree on on each of the bootstraps to get \hat {f^{*b}}(x)
  • Take the average of all the estimates t get your final estimate

\hat{f}_{\text{bag}}(x) = \dfrac{1}{B} \sum^B_{b=1} \hat {f^{*b}}(x)

27 / 35

Bagging

28 / 35

Bagging Notes

The number of bootstraps are not very important here, you just need to use a value of B that is large enough to have the error settled down, ~100 seems to work well

You do not overfit by increasing B, just increase the run-time

Bagged trees offer quite low interpretability since it is a mixture of multiple models

We can obtain a summary of the variable importance of our model by looking at the average amount of RSS a given predictor has decreased due to splits to a given variables

29 / 35

Random Forest

The Random Forest method offers an improvement over Bagged trees

One of the main downsides to Bagged Trees are that the trees become quiet correlated with each other

When fitting a Random forest, we start the same way as a Bagged tree with multiple bootstrapped data sets

but each time a split in a tree is considered, only a random sample of the predictors can be chosen

30 / 35

Random Forest

The sample is typically m = \sqrt{p} with p predictors

But this values is tuneable as well, along with everything tuneable from the decision tree

31 / 35

Random Forest

32 / 35

Boosting

Boosting is a general approach that can be used with many statistical machine learning methods

In bagging we fit multiple decision trees side by side

In Boosting we fit multiple decision trees back to back

33 / 35

Boosting

Algorithm

  • Fit a tree \hat {f^b} to the model
  • Update the final fit using a shrunken version of the tree
  • Update the residuals using a shrunken version of the tree
  • repeat B times

Final model

\hat f(x)= \sum_{b=1}^B \lambda \hat {f^b}(x)

34 / 35

Boosting

Large values of B can result in overfitting

The shrinkage parameter \lambda typically takes a small values but will need to be tuned

The number of splits d will need to be tuned as well, typically very small trees are fit during boosting

35 / 35
`\require{color}\definecolor{orange}{rgb}{1, 0.603921568627451, 0.301960784313725} \require{color}\definecolor{blue}{rgb}{0.301960784313725, 0.580392156862745, 1} \require{color}\definecolor{pink}{rgb}{0.976470588235294, 0.301960784313725, 1}`

Overview

We will cover 4 new methods today

  • Decision Trees
  • Bagging
  • Random Forrest
  • boosting
2 / 35
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow