These sets of labs will introduce you to logistic regression. This will also be your first introduction to the rsample package which we will use to perform train-test split.
In this exercise we will explore the mlc_churn
data set included in tidymodels.
library(tidymodels)
data("mlc_churn")
The data set contains a variable called churn
Create a test-train rsplit
object of mlc_churn
using initial_split()
. Use the arguments to set the proportions of the training data to be 80%. Stratify the sampling according to the churn
variable. How many observations are in the testing and training sets?
Create the training and testing data set with training()
and testing()
respectively. Does the observation counts match what you found in the last question?
Fit a logistic regression model using logistic_reg()
. Use number_vmail_messages
, total_intl_minutes
, total_intl_calls
, total_intl_charge
, number_customer_service_calls
as predictors. Remember to fit the model only using the training data set.
Inspect the model with summary()
and tidy()
. How good are the variables we have chosen?
Predict values for the testing data set. Use the type
argument to also get probability predictions.
Use conf_mat()
to construct a confusion matrix. Does the confusion matrix look good?
conf_mat()
is used as follows, where truth
is the name of the true response variable and estimate
is the name of the predicted response.
data %>%
conf_mat(truth, estimate)