Download template here
We will be using tidymodels and the flights
data set from {nycflights13}.
We will do the same transformation as we have done before.
flights1 <- flights %>%
mutate(delay = factor(arr_delay > 0, c(TRUE, FALSE),
c("Delayed", "On time"))) %>%
filter(month == 1, !is.na(delay)) %>%
select(delay, hour, minute, dep_delay, carrier, distance)
set.seed(1234)
flights_split <- initial_split(flights1)
flights_train <- training(flights_split)
flights_test <- testing(flights_split)
We will start using a svm_linear()
model . These can be used for both regression and classification so we need to specify it for this model. We will be using the kernlab
package as the engine.
svm_lin_spec <- svm_linear() %>%
set_mode("classification") %>%
set_engine("kernlab")
and then we will fit it right away. The fitting might take a minute or two but we shouldn’t worry.
svm_lin_fit <- fit(svm_lin_spec, delay ~ ., data = flights_train)
svm_lin_fit
We can get the confusion matrix
svm_lin_fit %>%
augment(new_data = flights_train) %>%
conf_mat(delay, .pred_class) %>%
autoplot(type = "heatmap")
and calculate the accuracy.
bean_tree %>%
augment(new_data = beans_train) %>%
accuracy(class, .pred_class)
They are not doing well.
Let us try a polynomial SVM model to see if that helps at all.
svm_poly_spec <- svm_poly(degree = 2) %>%
set_mode("classification") %>%
set_engine("kernlab")
svm_poly_fit <- fit(svm_poly_spec, delay ~ ., data = flights_train)
svm_poly_fit
calculating another confusion matrix doesn’t give us much luck.
svm_poly_fit %>%
augment(new_data = flights_train) %>%
conf_mat(delay, .pred_class) %>%
autoplot(type = "heatmap")
But wait, we didn’t do any preprocessing. Let us do some proper preprocessing to see if we can improve on the model. We also have a cost
parameter we could tune. Let us try that as well.