Adding a regression line on a ggplot2

Setup

This short post will show how to add a regression line to a ggplot2 chart. We start by loading in the USpop data found in the second lab.

USpop <- read.csv("USpop.csv")

Next, we create a simple linear regression model

my_fit <- lm(Population ~ Year, data = USpop)

my_fit

Call:
lm(formula = Population ~ Year, data = USpop)

Coefficients:
(Intercept)         Year  
   -2480.70         1.36  

Let us start with a simple scatter chart using ggplot2

library(ggplot2)

ggplot(USpop, aes(Year, Population)) +
  geom_point()

Adding a regression line to this chart can be done in a couple of different ways.

Using geom_smooth()

The first way is by far the most simple. Using geom_smooth() fits a line directly to the data inside ggplot itself, if we specify method = "lm and formula = y ~ x then we force that line to a simple linear regression. Setting se = FALSE hides the confidence intervals.

ggplot(USpop, aes(Year, Population)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE)

This method while pleasing doesn’t answer the question at hand. It doesn’t let us add our already fitted model, but instead fits a model itself.

using broom::augment()

You can use the broom package to extract various information from the model fit.

library(broom)
augment(my_fit)
# A tibble: 23 x 8
   Population  Year .fitted .resid .std.resid   .hat .sigma .cooksd
        <dbl> <int>   <dbl>  <dbl>      <dbl>  <dbl>  <dbl>   <dbl>
 1        3.9  1790  -45.7   49.6       1.94  0.163    26.0 0.365  
 2        5.3  1800  -32.1   37.4       1.44  0.142    27.2 0.173  
 3        7.2  1810  -18.5   25.7       0.980 0.124    28.0 0.0676 
 4        9.6  1820   -4.85  14.5       0.547 0.107    28.5 0.0179 
 5       12.9  1830    8.75   4.15      0.156 0.0919   28.6 0.00123
 6       17.1  1840   22.4   -5.25     -0.196 0.0791   28.6 0.00164
 7       23.2  1850   36.0  -12.8      -0.472 0.0682   28.5 0.00816
 8       31.4  1860   49.6  -18.2      -0.669 0.0593   28.4 0.0141 
 9       38.6  1870   63.2  -24.6      -0.902 0.0524   28.1 0.0225 
10       50.2  1880   76.8  -26.6      -0.973 0.0474   28.0 0.0236 
# … with 13 more rows

Using this data along with geom_line() allows us to all a fitted line on top of our ggplot

ggplot(USpop, aes(Year, Population)) +
  geom_point() +
  geom_line(data = augment(my_fit), 
            aes(x = Year, y = .fitted))

The main downside to this way is that it does not easily extrapolate outside the range so we zoom out the line doesn’t expand

ggplot(USpop, aes(Year, Population)) +
  geom_point() +
  geom_line(data = augment(my_fit), 
            aes(x = Year, y = .fitted)) +
  lims(x = c(1700, 2100), y = c(-100, 500))

coef() and geom_abline()

Lastly, we can extract the parameters estimates directly with coef() and use these to add a single line with geom_abline()

my_coef <- coef(my_fit)
my_coef
 (Intercept)         Year 
-2480.701976     1.360356 
ggplot(USpop, aes(Year, Population)) +
  geom_point() +
  geom_abline(intercept = my_coef[1], slope = my_coef[2])

Since geom_abline() add a line you can zoom out and still see the line

ggplot(USpop, aes(Year, Population)) +
  geom_point() +
  geom_abline(intercept = my_coef[1], slope = my_coef[2]) +
  lims(x = c(1700, 2100), y = c(-100, 500))