class: center, middle, title-slide # Introduction to tidymodels ## NHS-R Conference 2021 ### Emil Hvitfeldt ### 2021-11-02 ---
NHS tidymodels workshop
Home
Slides
▾
1: Introduction
2: Models
3: Features
4: Resampling
5: Tuning
☰
<!--- Packages ---------------------------------------------------------------> <!--- Chunk options ----------------------------------------------------------> <!--- pkg highlight ----------------------------------------------------------> <style> .pkg { font-weight: bold; letter-spacing: 0.5pt; color: #866BBF; } </style> <!--- Highlighing colors -----------------------------------------------------> <div style = "position:fixed; visibility: hidden"> `$$\require{color}\definecolor{purple}{rgb}{0.525490196078431, 0.419607843137255, 0.749019607843137}$$` `$$\require{color}\definecolor{green}{rgb}{0.0117647058823529, 0.650980392156863, 0.415686274509804}$$` `$$\require{color}\definecolor{orange}{rgb}{0.949019607843137, 0.580392156862745, 0.254901960784314}$$` `$$\require{color}\definecolor{white}{rgb}{1, 1, 1}$$` </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { purple: ["{\\color{purple}{#1}}", 1], green: ["{\\color{green}{#1}}", 1], orange: ["{\\color{orange}{#1}}", 1], white: ["{\\color{white}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .purple {color: #866BBF;} .green {color: #03A66A;} .orange {color: #F29441;} .white {color: #FFFFFF;} </style> <!--- knitr hooks ------------------------------------------------------------> # Our goals for this workshop * Introduce tidymodels and its general philosophy on modeling. * Help you become proficient with the core packages for modeling. * Point you to places to learn more and get help. --- # Why tidymodels? There are several other modeling frameworks in R that try to: * create a uniform, cohesive, and unsurprising set of modeling APIs Examples are .pkg[caret], .pkg[mlr3], and others. * .pkg[caret] is more favorable for people who prefer base R/traditional interfaces. * .pkg[mlr3] is more pythonic and also has many features. * .pkg[tidymodels] would probably be preferable to those who place importance on a tidy _R_ interface, a large number of features, and the idea that the interfaces should enable the "pit of success". --- # The tidymodels package There are a lot of tidymodels packages but about 90% of the work is done by 5 packages. (.pkg[rsample], .pkg[recipes], .pkg[parsnip], .pkg[tune], and .pkg[yardstick]) The best way to get started with tidymodels is to use the .pkg[tidymodels] meta-package. It loads the core packages plus some tidyverse packages. Some helpful links: * List of [all tidymodels functions](https://www.tidymodels.org/find/#search-all-of-tidymodels) * List of [all parsnip models](https://www.tidymodels.org/find/parsnip/) * List of [all recipe steps](https://www.tidymodels.org/find/recipes/) --- # The tidymodels package ```r library(tidymodels) ``` ```r ── Attaching packages ─────────────────────────────────────────────────── tidymodels 0.1.4 ── ✓ broom 0.7.9 ✓ recipes 0.1.17 ✓ dials 0.0.10 ✓ rsample 0.1.0 ✓ dplyr 1.0.7 ✓ tibble 3.1.5 ✓ ggplot2 3.3.5 ✓ tidyr 1.1.4 ✓ infer 1.0.0 ✓ tune 0.1.6 ✓ modeldata 0.1.1 ✓ workflows 0.2.4 ✓ parsnip 0.1.7 ✓ workflowsets 0.1.0 ✓ purrr 0.3.4 ✓ yardstick 0.0.8 ── Conflicts ────────────────────────────────────────────────────── tidymodels_conflicts() ── x purrr::discard() masks scales::discard() x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() x recipes::step() masks stats::step() • Use tidymodels_prefer() to resolve common conflicts. ``` --- # Managing name conflicts ```r tidymodels_prefer(quiet = FALSE) ``` ``` ## [conflicted] Will prefer dplyr::filter over any other package ## [conflicted] Will prefer dplyr::select over any other package ## [conflicted] Will prefer dplyr::slice over any other package ## [conflicted] Will prefer dplyr::rename over any other package ## [conflicted] Will prefer dials::neighbors over any other package ## [conflicted] Will prefer plsmod::pls over any other package ## [conflicted] Will prefer purrr::map over any other package ## [conflicted] Will prefer recipes::step over any other package ## [conflicted] Will prefer themis::step_downsample over any other package ## [conflicted] Will prefer themis::step_upsample over any other package ## [conflicted] Will prefer tune::tune over any other package ## [conflicted] Will prefer yardstick::precision over any other package ## [conflicted] Will prefer yardstick::recall over any other package ``` --- # Base R and tidyverse differences .pull-left[ Base R/caret ```r mtcars <- mtcars[order(mtcars$cyl),] mtcars <- mtcars[, "mpg", drop = FALSE] # ────────────────────────────────────────────── mtcars$mp # matches incomplete arg mtcars[, "mpg"] # a vector # ────────────────────────────────────────────── num_args <- function(x) length(formals(x)) num_args(caret::trainControl) + num_args(caret:::train.default) ``` ``` 38 ``` ] .pull-right[ tidyverse/tidymodels ```r mtcars %>% arrange(cyl) %>% select(mpg) # ────────────────────────────────────────────── tb_cars <- as_tibble(mtcars) tb_cars$mp # fails tb_cars[, "mpg"] # A tibble # ────────────────────────────────────────────── num_args(linear_reg) + num_args(set_engine) + num_args(tune_grid) + num_args(control_grid) + num_args(vfold_cv) ``` ``` 23 ``` ] --- # Example data set These data are used in our [Feature Engineering and Selection](https://bookdown.org/max/FES/chicago-intro.html) book. Several years worth of pre-pandemic data were assembled to try to predict the daily number of people entering the Clark and Lake elevated ("L") train station in Chicago. For predictors, * the 14-day lagged ridership at this and other stations (units: thousands of rides/day) * weather data * home/away game schedules for Chicago teams * the date The data are in `modeldata`. See `?Chicago`. --- # Hands-On: Explore the Data Take a look at these data for a few minutes and see if you can find any interesting characteristics in the predictors or the outcome. ```r library(tidymodels) data("Chicago") dim(Chicago) ``` ``` ## [1] 5698 50 ``` ```r stations ``` ``` ## [1] "Austin" "Quincy_Wells" "Belmont" ## [4] "Archer_35th" "Oak_Park" "Western" ## [7] "Clark_Lake" "Clinton" "Merchandise_Mart" ## [10] "Irving_Park" "Washington_Wells" "Harlem" ## [13] "Monroe" "Polk" "Ashland" ## [16] "Kedzie" "Addison" "Jefferson_Park" ## [19] "Montrose" "California" ```
10
:
00
--- layout: false class: inverse, middle, center # [`tidymodels.org`](https://www.tidymodels.org/) # _Tidy Modeling with R_ ([`tmwr.org`](https://www.tmwr.org/))