Week 2 - Monday

These sets of labs will introduce you to logistic regression. This will also be your first introduction to the rsample package which we will use to perform train-test split.

Exercise 1

In this exercise we will explore the mlc_churn data set included in tidymodels.

library(tidymodels)
data("mlc_churn")

The data set contains a variable called churn

  1. Create a test-train rsplit object of mlc_churn using initial_split(). Use the arguments to set the proportions of the training data to be 80%. Stratify the sampling according to the churn variable. How many observations are in the testing and training sets?

  2. Create the training and testing data set with training() and testing() respectively. Does the observation counts match what you found in the last question?

  3. Fit a logistic regression model using logistic_reg(). Use number_vmail_messages, total_intl_minutes, total_intl_calls, total_intl_charge, number_customer_service_calls as predictors. Remember to fit the model only using the training data set.

  4. Inspect the model with summary() and tidy(). How good are the variables we have chosen?

  5. Predict values for the testing data set. Use the type argument to also get probability predictions.

  6. Use conf_mat() to construct a confusion matrix. Does the confusion matrix look good?

conf_mat() is used as follows, where truth is the name of the true response variable and estimate is the name of the predicted response.

data %>%
  conf_mat(truth, estimate)