This short post will show how to add a regression line to a ggplot2 chart. We start by loading in the USpop
data found in the second lab.
USpop <- read.csv("USpop.csv")
Next, we create a simple linear regression model
my_fit <- lm(Population ~ Year, data = USpop)
my_fit
Call:
lm(formula = Population ~ Year, data = USpop)
Coefficients:
(Intercept) Year
-2480.70 1.36
Let us start with a simple scatter chart using ggplot2
library(ggplot2)
ggplot(USpop, aes(Year, Population)) +
geom_point()
Adding a regression line to this chart can be done in a couple of different ways.
geom_smooth()
The first way is by far the most simple. Using geom_smooth()
fits a line directly to the data inside ggplot itself, if we specify method = "lm
and formula = y ~ x
then we force that line to a simple linear regression. Setting se = FALSE
hides the confidence intervals.
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE)
This method while pleasing doesn’t answer the question at hand. It doesn’t let us add our already fitted model, but instead fits a model itself.
broom::augment()
You can use the broom package to extract various information from the model fit.
# A tibble: 23 x 8
Population Year .fitted .resid .std.resid .hat .sigma .cooksd
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3.9 1790 -45.7 49.6 1.94 0.163 26.0 0.365
2 5.3 1800 -32.1 37.4 1.44 0.142 27.2 0.173
3 7.2 1810 -18.5 25.7 0.980 0.124 28.0 0.0676
4 9.6 1820 -4.85 14.5 0.547 0.107 28.5 0.0179
5 12.9 1830 8.75 4.15 0.156 0.0919 28.6 0.00123
6 17.1 1840 22.4 -5.25 -0.196 0.0791 28.6 0.00164
7 23.2 1850 36.0 -12.8 -0.472 0.0682 28.5 0.00816
8 31.4 1860 49.6 -18.2 -0.669 0.0593 28.4 0.0141
9 38.6 1870 63.2 -24.6 -0.902 0.0524 28.1 0.0225
10 50.2 1880 76.8 -26.6 -0.973 0.0474 28.0 0.0236
# … with 13 more rows
Using this data along with geom_line()
allows us to all a fitted line on top of our ggplot
The main downside to this way is that it does not easily extrapolate outside the range so we zoom out the line doesn’t expand
coef()
and geom_abline()
Lastly, we can extract the parameters estimates directly with coef()
and use these to add a single line with geom_abline()
my_coef <- coef(my_fit)
my_coef
(Intercept) Year
-2480.701976 1.360356
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_abline(intercept = my_coef[1], slope = my_coef[2])
Since geom_abline()
add a line you can zoom out and still see the line
ggplot(USpop, aes(Year, Population)) +
geom_point() +
geom_abline(intercept = my_coef[1], slope = my_coef[2]) +
lims(x = c(1700, 2100), y = c(-100, 500))