Resampling Methods

class: center, middle, title-slide

# Resampling Methods
## AU STAT-427/627
### Emil Hvitfeldt
### 2021-10-04

---

<div style = "position:fixed; visibility: hidden">
`$$\require{color}\definecolor{orange}{rgb}{1, 0.603921568627451, 0.301960784313725}$$`
`$$\require{color}\definecolor{blue}{rgb}{0.301960784313725, 0.580392156862745, 1}$$`
`$$\require{color}\definecolor{pink}{rgb}{0.976470588235294, 0.301960784313725, 1}$$`
</div>

# Motivation

We are already familiar with train-test splits

The main downside to train-test splits so far is that we can only use them once

This means we effectively can't make any decisions about the models we are using

---

# Resampling

Resampling estimates of performance can generalize to new data

---

# Resampling Workflow

.center[
![:scale 70%](images/resampling.svg)
]

---

# Resampling Workflow

The resampling is only conducted on the training set

We are still keeping the test set. The test set is not involved.

For each iteration of resampling, the data are partitioned into two subsamples:

- The model is fitted with the .orange[analysis set]
- The model is evaluated with the .blue[assessment set]

---

# Resampling Workflow

We have effectively created many train-test split out of our training data set.

The .blue[challange] here now becomes how we are creating these resample sets

---

# Resampling Workflow

Suppose we generate 10 different resamples

This means that we will be:

- Fitting 10 different models
- Perform predictions 10 times
- Produce 10 sets of performance statistics

The final estimate of the .blue[performance] of the model will be the average of these 10 models

---

# Resampling Workflow

If the resampling is done in an appropriate way then this average has very good generalization properties

---

# Leave-One-Out Cross-Validation

- 1 observation is used as the .blue[assessment set]
- The remaining observations make up the .orange[analysis set]

Notes:

We are fitting the model on `$n-1$` observations and a prediction `$\hat y_1$` is made on the .blue[assessment set] using the value `$x_1$`

---

# Leave-One-Out Cross-Validation

Since `$(x_1, y_1)$` is not used in the fitting process, then `$MSE_1 = (y_1 - \hat y_1)^2$` provides an approximately unbiased estimate for the test error.

While this estimate is approximately unbiased, it is quite poor since it is highly variable

---

# Leave-One-Out Cross-Validation

We can repeat this for

- `$MSE_2 = (y_2 - \hat y_2)^2$`
- `$MSE_3 = (y_3 - \hat y_3)^2$`
- ...
- `$MSE_n = (y_n - \hat y_n)^2$`

to get `$n$` estimates of the test error
 
---

# Leave-One-Out Cross-Validation

The LOOCV estimate of the test MSE is

`$$CV_{(n)} = \dfrac{1}{n} \sum^n_{i=1}MSE_i$$`

---

# Leave-One-Out Cross-Validation

## Pros

The LOOCV estimate of the test MSE is going to have a low bias

There is no randomness in the LOOCV estimate

## Cons

You need a lot of computational power even for modest data sets

(Some models don't need to be repeatedly refit)

---

# K-Fold Cross-Validation

Could we think of a compromise between fitting 1 model and `$n$` models?

.pink[K-Fold Cross Validation] has an answer:

Randomly divide the observations into `$k$` groups (or .blue[folds]) or approximately equal size

---

# K-Fold Cross-Validation

Randomly divide the observations into `$k$` groups (or .blue[folds]) or approximately equal size

- 1 .blue[fold] is used as the .blue[assessment set]
- The remaining .blue[folds] make up the .orange[analysis set]

Everything else happens as before.

We now get fewer performance metrics, BUT they are each less variable

---
background-image: url(images/cross-validation/Slide2.png)
background-size: contain