This short post will show how to add a regression line to a ggplot2 chart. We start by loading in the USpop
data found in the second lab.
USpop <- read.csv("USpop.csv")
Next, we create a simple linear regression model
my_fit <- lm(Population ~ Year, data = USpop)
my_fit
Call:
lm(formula = Population ~ Year, data = USpop)
Coefficients:
(Intercept) Year
-2480.70 1.36
Let us start with a simple scatter chart using ggplot2
library(ggplot2)
ggplot(USpop, aes(Year, Population)) +
geom_point()
Adding a regression line to this chart can be done in a couple of different ways.
geom_smooth()
The first way is by far the most simple. Using geom_smooth()
fits a line directly to the data inside ggplot itself, if we specify method = "lm
and formula = y ~ x
then we force that line to a simple linear regression. Setting se = FALSE
hides the confidence intervals.
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE)
This method while pleasing doesn’t answer the question at hand. It doesn’t let us add our already fitted model, but instead fits a model itself.
broom::augment()
You can use the broom package to extract various information from the model fit.
# A tibble: 23 x 8
Population Year .fitted .resid .std.resid .hat .sigma .cooksd
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3.9 1790 -45.7 49.6 1.94 0.163 26.0 0.365
2 5.3 1800 -32.1 37.4 1.44 0.142 27.2 0.173
3 7.2 1810 -18.5 25.7 0.980 0.124 28.0 0.0676
4 9.6 1820 -4.85 14.5 0.547 0.107 28.5 0.0179
5 12.9 1830 8.75 4.15 0.156 0.0919 28.6 0.00123
6 17.1 1840 22.4 -5.25 -0.196 0.0791 28.6 0.00164
7 23.2 1850 36.0 -12.8 -0.472 0.0682 28.5 0.00816
8 31.4 1860 49.6 -18.2 -0.669 0.0593 28.4 0.0141
9 38.6 1870 63.2 -24.6 -0.902 0.0524 28.1 0.0225
10 50.2 1880 76.8 -26.6 -0.973 0.0474 28.0 0.0236
# … with 13 more rows
Using this data along with geom_line()
allows us to all a fitted line on top of our ggplot
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_line(data = augment(my_fit),
aes(x = Year, y = .fitted))
The main downside to this way is that it does not easily extrapolate outside the range so we zoom out the line doesn’t expand
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_line(data = augment(my_fit),
aes(x = Year, y = .fitted)) +
lims(x = c(1700, 2100), y = c(-100, 500))
coef()
and geom_abline()
Lastly, we can extract the parameters estimates directly with coef()
and use these to add a single line with geom_abline()
my_coef <- coef(my_fit)
my_coef
(Intercept) Year
-2480.701976 1.360356
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_abline(intercept = my_coef[1], slope = my_coef[2])
Since geom_abline()
add a line you can zoom out and still see the line
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_abline(intercept = my_coef[1], slope = my_coef[2]) +
lims(x = c(1700, 2100), y = c(-100, 500))