Diagnnostics and Remedial Measures

class: center, middle, title-slide

# Diagnnostics and Remedial Measures
## AU STAT-615
### Emil Hvitfeldt
### 2021-02-17

---

`$$\require{color}\definecolor{orange}{rgb}{1, 0.603921568627451, 0.301960784313725}$$`
`$$\require{color}\definecolor{blue}{rgb}{0.301960784313725, 0.580392156862745, 1}$$`
`$$\require{color}\definecolor{pink}{rgb}{0.976470588235294, 0.301960784313725, 1}$$`

# Residuals

Diagnostics for predictor variables will be covered in Chapter 10

Diagnostics for the response variable are usually carried out indirectly through an examination of the residuals

`$$e_i = Y_i - \hat Y_i$$`

For the unknown true error `$\varepsilon_i = Y_i - E\{Y_i\}$` we know that `$E\{\varepsilon_i\} = 0$` and `$V\{\varepsilon_i\} = \sigma^2$`

---

# Idea

If the fitted model is appropriate for the data at hand, the observed values `$e_i$` should reflect the properties assumed for `$\varepsilon_i$`

---

# Properties of Residuals

## Mean

The mean is given by

`$$\bar e = \dfrac{\sum e_i}{n} = 0$$`

Since this is always true, it provides no information as to whether the true errors `$\varepsilon_i$` have expected value `$E\{\varepsilon_i\} = 0$` 😥

---

# Properties of Residuals

## Variance

The variance is given by

`$$s^2 = \dfrac{\sum(e_i - \bar e)^2}{n - 2} = \dfrac{\sum e_i^2}{n - 2}= \dfrac{SSE}{n - 2} = MSE$$`

It can be showed that `$E\{MSE\}=\sigma^2$` is an unbiased estimator of the variance of the error terms `$\sigma^2$`

So if the model is appropriate, MSE is, an unbiased estimator of the variance of the error terms `$\sigma^2$`

---

# Non-independence

The residuals `$\varepsilon_i$` are .blue[not] independent random variables because they involve the fitted values `$\hat Y_i$` which are based on the same fitted regression function

---

# Departures from model to be studied by residuals

1. The regression function is not linear
1. The error terms do not have constant variance
1. The error terms are not independent
1. The model fits all but one or a few outlier observations
1. The error terms are not normally distributed
1. One or several important predictor variables have been omitted from the model

---

# Departures from model to be studied by residuals

1. .blue[The regression function is not linear]
1. .blue[The error terms do not have constant variance]
1. .blue[The error terms are not independent]
1. The model fits all but one or a few outlier observations
1. .blue[The error terms are not normally distributed]
1. One or several important predictor variables have been omitted from the model

.blue[L]inearity, .blue[I]ndependence, .blue[N]ormality, .blue[E]qual Variance

---

# Non-linearity of Regression Function

The residual plot against the predictor variable

---

# Non-linearity of Regression Function

Consider we have the following data

---

# Non-linearity of Regression Function

the fitted regression line goes here

---

# Non-linearity of Regression Function

And the residuals against the predictor variable looks like this

Perfectly linear

---

# Non-linearity of Regression Function

Consider we have the new following data

---

# Non-linearity of Regression Function

the fitted line would be

---

# Non-linearity of Regression Function

And the residuals will look like this. Clearly not linear

---