Hyperparameter Fine-Tuning in tidymodels

Posit Days

University of Wisconsin

Emil Hvitfeldt

2025-11-12

Ames data

library(tidymodels)

set.seed(1234)
ames_split <- initial_split(ames)
ames_training <- training(ames_split)
ames_testing <- testing(ames_split)

glimpse(ames_training)

Rows: 2,197
Columns: 74
$ MS_SubClass        <fct> Two_Story_1946_and_Newer, Duplex_All_Styles_and_Age…
$ MS_Zoning          <fct> Residential_Low_Density, Residential_Low_Density, R…
$ Lot_Frontage       <dbl> 100, 65, 50, 24, 70, 0, 70, 60, 53, 136, 0, 85, 60,…
$ Lot_Area           <int> 10839, 8944, 9000, 1488, 8120, 9375, 9100, 7500, 40…
$ Street             <fct> Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pav…
$ Alley              <fct> No_Alley_Access, No_Alley_Access, No_Alley_Access, …
$ Lot_Shape          <fct> Slightly_Irregular, Regular, Regular, Regular, Regu…
$ Land_Contour       <fct> Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, L…
$ Utilities          <fct> AllPub, AllPub, AllPub, AllPub, AllPub, AllPub, All…
$ Lot_Config         <fct> Corner, Inside, Inside, Inside, Inside, Inside, Ins…
$ Land_Slope         <fct> Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, G…
$ Neighborhood       <fct> Gilbert, North_Ames, Iowa_DOT_and_Rail_Road, Bluest…
$ Condition_1        <fct> Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Nor…
$ Condition_2        <fct> Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Nor…
$ Bldg_Type          <fct> OneFam, Duplex, OneFam, TwnhsE, OneFam, OneFam, One…
$ House_Style        <fct> Two_Story, One_Story, One_Story, Two_Story, One_Sto…
$ Overall_Cond       <fct> Average, Average, Below_Average, Above_Average, Goo…
$ Year_Built         <int> 1997, 1967, 1919, 1980, 1970, 2002, 2000, 1999, 197…
$ Year_Remod_Add     <int> 1998, 1967, 1950, 1992, 1970, 2002, 2000, 2003, 197…
$ Roof_Style         <fct> Gable, Gable, Gable, Gable, Gable, Gable, Gable, Ga…
$ Roof_Matl          <fct> CompShg, CompShg, CompShg, CompShg, CompShg, CompSh…
$ Exterior_1st       <fct> VinylSd, Plywood, Wd Sdng, MetalSd, MetalSd, VinylS…
$ Exterior_2nd       <fct> VinylSd, Plywood, Wd Sdng, MetalSd, MetalSd, VinylS…
$ Mas_Vnr_Type       <fct> None, None, None, None, None, BrkFace, BrkFace, Non…
$ Mas_Vnr_Area       <dbl> 0, 0, 0, 0, 0, 149, 244, 0, 0, 495, 169, 0, 0, 192,…
$ Exter_Cond         <fct> Typical, Typical, Typical, Good, Good, Typical, Typ…
$ Foundation         <fct> PConc, CBlock, BrkTil, CBlock, CBlock, PConc, PConc…
$ Bsmt_Cond          <fct> Typical, Typical, Typical, Typical, Typical, Typica…
$ Bsmt_Exposure      <fct> No, No, No, Mn, No, No, Av, No, No, Av, Av, No, No_…
$ BsmtFin_Type_1     <fct> Unf, Unf, Unf, ALQ, ALQ, Unf, GLQ, Unf, ALQ, GLQ, G…
$ BsmtFin_SF_1       <dbl> 7, 7, 7, 1, 1, 7, 3, 7, 1, 3, 3, 7, 5, 6, 3, 7, 2, …
$ BsmtFin_Type_2     <fct> Unf, Unf, Unf, Unf, Unf, Unf, Unf, Unf, BLQ, Unf, U…
$ BsmtFin_SF_2       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 156, 0, 0, 0, 0, 0, 0, 0, 8…
$ Bsmt_Unf_SF        <dbl> 926, 1584, 610, 90, 673, 1284, 125, 938, 186, 322, …
$ Total_Bsmt_SF      <dbl> 926, 1584, 610, 561, 864, 1284, 1525, 938, 1069, 19…
$ Heating            <fct> GasA, GasA, GasA, GasA, GasA, GasA, GasA, GasA, Gas…
$ Heating_QC         <fct> Excellent, Typical, Excellent, Typical, Excellent, …
$ Central_Air        <fct> Y, Y, N, Y, Y, Y, Y, Y, Y, Y, Y, N, Y, Y, Y, Y, Y, …
$ Electrical         <fct> SBrkr, SBrkr, FuseA, SBrkr, SBrkr, SBrkr, SBrkr, SB…
$ First_Flr_SF       <int> 926, 1584, 819, 561, 864, 1284, 1525, 957, 1069, 20…
$ Second_Flr_SF      <int> 678, 0, 0, 668, 0, 885, 0, 1342, 0, 0, 0, 520, 0, 0…
$ Gr_Liv_Area        <int> 1604, 1584, 819, 1229, 864, 2169, 1525, 2299, 1069,…
$ Bsmt_Full_Bath     <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, …
$ Bsmt_Half_Bath     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Full_Bath          <int> 2, 2, 1, 1, 1, 2, 2, 3, 2, 2, 1, 2, 1, 1, 2, 1, 1, …
$ Half_Bath          <int> 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, …
$ Bedroom_AbvGr      <int> 3, 4, 2, 2, 3, 3, 3, 5, 2, 2, 1, 3, 2, 3, 3, 1, 3, …
$ Kitchen_AbvGr      <int> 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ TotRms_AbvGrd      <int> 7, 8, 4, 5, 5, 7, 6, 7, 4, 5, 3, 7, 5, 5, 7, 5, 6, …
$ Functional         <fct> Typ, Mod, Typ, Typ, Typ, Typ, Typ, Typ, Typ, Typ, T…
$ Fireplaces         <int> 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, …
$ Garage_Type        <fct> Attchd, Detchd, No_Garage, Attchd, Detchd, Attchd, …
$ Garage_Finish      <fct> Fin, Unf, No_Garage, Fin, Unf, RFn, RFn, Fin, RFn, …
$ Garage_Cars        <dbl> 2, 3, 0, 2, 2, 2, 2, 2, 2, 3, 2, 0, 2, 1, 3, 1, 1, …
$ Garage_Area        <dbl> 470, 792, 0, 462, 463, 647, 541, 482, 440, 938, 420…
$ Garage_Cond        <fct> Typical, Typical, No_Garage, Typical, Typical, Typi…
$ Paved_Drive        <fct> Paved, Paved, Dirt_Gravel, Paved, Paved, Paved, Pav…
$ Wood_Deck_SF       <int> 0, 0, 0, 176, 0, 192, 219, 188, 0, 144, 160, 0, 0, …
$ Open_Porch_SF      <int> 36, 152, 0, 0, 0, 87, 36, 30, 55, 33, 0, 0, 0, 0, 3…
$ Enclosed_Porch     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 0, 39, 0, 70,…
$ Three_season_porch <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Screen_Porch       <int> 0, 0, 0, 0, 0, 0, 0, 0, 225, 0, 0, 0, 0, 0, 0, 0, 0…
$ Pool_Area          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Pool_QC            <fct> No_Pool, No_Pool, No_Pool, No_Pool, No_Pool, No_Poo…
$ Fence              <fct> No_Fence, No_Fence, No_Fence, Good_Privacy, No_Fenc…
$ Misc_Feature       <fct> None, None, None, None, None, None, None, None, Non…
$ Misc_Val           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Mo_Sold            <int> 7, 4, 9, 10, 7, 8, 9, 8, 7, 7, 5, 8, 6, 4, 8, 3, 8,…
$ Year_Sold          <int> 2008, 2009, 2006, 2009, 2009, 2007, 2006, 2007, 200…
$ Sale_Type          <fct> WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , W…
$ Sale_Condition     <fct> Normal, Normal, Abnorml, Normal, Normal, Normal, No…
$ Sale_Price         <int> 181000, 124000, 72000, 137000, 124500, 228500, 2350…
$ Longitude          <dbl> -93.63915, -93.61967, -93.62757, -93.64574, -93.625…
$ Latitude           <dbl> 42.05941, 42.04931, 42.02495, 42.00949, 42.05331, 4…

A model

rec_spec <- recipe(Sale_Price ~ ., data = ames_training) |>
  step_nzv(all_numeric_predictors()) |>
  step_normalize(all_numeric_predictors()) |>
  step_dummy(all_nominal_predictors()) |>
  step_nzv(all_predictors())

mod_spec <- nearest_neighbor("regression", "kknn", neighbors = 5)

wf_spec <- workflow(rec_spec, mod_spec)

wf_fit <- fit(wf_spec, ames_training)

A Model fit

wf_fit

══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: nearest_neighbor()

── Preprocessor ────────────────────────────────────────────────────────────────
4 Recipe Steps

• step_nzv()
• step_normalize()
• step_dummy()
• step_nzv()

── Model ───────────────────────────────────────────────────────────────────────

Call:
kknn::train.kknn(formula = ..y ~ ., data = data, ks = min_rows(5,     data, 5))

Type of response variable: continuous
minimal mean absolute error: 24433.04
Minimal mean squared error: 1469863503
Best kernel: optimal
Best k: 5

Motivation

We are working on a supervised modeling task (regression, classification, survival analysis)

Part of your model workflow (preprocessing, model, postprocessing) has a parameter that can’t be estimated from the data directly

Is it a hyperparameters?

No

intercept in linear model
family in error
random seed

Yes

Tree depth in decision trees
Number of neighbors in a K-Nearest Neighbor model
Number of PCs to keep

Hyperparameter Tuning

Try different values and measure their performance.
Find good values for these parameters.
Once the value(s) of the parameter(s) are determined, a model can be finalized by fitting the model to the entire training set.

The two main strategies for optimization

Grid Search

which tests a pre-defined set of candidate values.

Iterative Search

which suggests/estimates new values of candidate parameters to evaluate.

Grid Search

Specify what to tune

mod_spec <- nearest_neighbor("regression", "kknn", neighbors = tune())

wf_spec <- workflow(rec_spec, mod_spec)

How can we use

the training data

to compare and evaluate

different models?

Cross-validation

Fitting multiple values

grid <- tibble(
  neighbors = c(3, 5, 11, 19, 55)
)

set.seed(1234)
ames_folds <- vfold_cv(ames_training, v = 5)

tune_res <- tune_grid(wf_spec, resamples = ames_folds, grid = grid)

Results

collect_metrics(tune_res)

# A tibble: 10 × 7
   neighbors .metric .estimator      mean     n   std_err .config        
       <dbl> <chr>   <chr>          <dbl> <int>     <dbl> <chr>          
 1         3 rmse    standard   41220.        5 670.      pre0_mod1_post0
 2         3 rsq     standard       0.737     5   0.0113  pre0_mod1_post0
 3         5 rmse    standard   38685.        5 573.      pre0_mod2_post0
 4         5 rsq     standard       0.769     5   0.00882 pre0_mod2_post0
 5        11 rmse    standard   36857.        5 727.      pre0_mod3_post0
 6        11 rsq     standard       0.795     5   0.00886 pre0_mod3_post0
 7        19 rmse    standard   36748.        5 753.      pre0_mod4_post0
 8        19 rsq     standard       0.802     5   0.00925 pre0_mod4_post0
 9        55 rmse    standard   38390.        5 746.      pre0_mod5_post0
10        55 rsq     standard       0.798     5   0.00879 pre0_mod5_post0

Results

autoplot(tune_res) + theme_minimal()

Creating parameter grids

params <- extract_parameter_set_dials(wf_spec)

grid_random(params, size = 7)

# A tibble: 5 × 1
  neighbors
      <int>
1        13
2         8
3         2
4         5
5        10

grid_regular(params, levels = 7)

# A tibble: 7 × 1
  neighbors
      <int>
1         1
2         3
3         5
4         8
5        10
6        12
7        15

Update Ranges

params <- params |>
  update(neighbors = neighbors(c(1, 55)))

grid_random(params, size = 7)

# A tibble: 6 × 1
  neighbors
      <int>
1        36
2        30
3        53
4        37
5         4
6        17

grid_regular(params, levels = 7)

# A tibble: 7 × 1
  neighbors
      <int>
1         1
2        10
3        19
4        28
5        37
6        46
7        55

Having multiple hyperparameters

So far the changeles have been fairly minor due to only having 1 hyper parameter.

as we increase the number of hyperparameters we start running into other types of issues.

Xgboost model

mod_spec <- boost_tree(
  mode = "regression", 
  engine = "xgboost", 
  trees = tune(), 
  tree_depth = tune(), 
  learn_rate = tune(), 
  min_n = tune(),
  loss_reduction = tune()
)

wf_spec <- workflow(rec_spec, mod_spec)

Creating parameter grids

params <- extract_parameter_set_dials(wf_spec)

grid_random(params, size = 7)

# A tibble: 7 × 5
  trees min_n tree_depth learn_rate loss_reduction
  <int> <int>      <int>      <dbl>          <dbl>
1   735     2          5    0.149         3.90e- 1
2  1739    26          9    0.00269       1.25e- 9
3   192     9          6    0.00752       1.11e- 9
4  1155     7          3    0.0108        7.42e- 9
5   339    31          4    0.0250        2.71e- 8
6    47    18         14    0.0384        5.18e- 2
7  1580    18          2    0.118         1.71e-10

grid_regular(params, levels = 7)

# A tibble: 16,807 × 5
   trees min_n tree_depth learn_rate loss_reduction
   <int> <int>      <int>      <dbl>          <dbl>
 1     1     2          1      0.001   0.0000000001
 2   334     2          1      0.001   0.0000000001
 3   667     2          1      0.001   0.0000000001
 4  1000     2          1      0.001   0.0000000001
 5  1333     2          1      0.001   0.0000000001
 6  1666     2          1      0.001   0.0000000001
 7  2000     2          1      0.001   0.0000000001
 8     1     8          1      0.001   0.0000000001
 9   334     8          1      0.001   0.0000000001
10   667     8          1      0.001   0.0000000001
# ℹ 16,797 more rows

Different types of grids

Space-filling designs (SFD) attempt to cover the parameter space without redundant candidates. We recommend these the most, and they are the default.

Space Filling Grids

grid <- grid_space_filling(params, size = 50)
grid

# A tibble: 50 × 5
   trees min_n tree_depth learn_rate loss_reduction
   <int> <int>      <int>      <dbl>          <dbl>
 1     1    15          7    0.0483   0.00000000256
 2    41    16         10    0.00932  2.12         
 3    82    37          9    0.0212   0.000217     
 4   123     7          3    0.0268   0.0281       
 5   164     4         11    0.00409  0.0000429    
 6   204    25          1    0.0168   0.0000146    
 7   245    17          7    0.281    0.0164       
 8   286    12         15    0.0611   0.0000250    
 9   327     7          3    0.00324  0.0000000222 
10   368    22         12    0.00126  0.00000848   
# ℹ 40 more rows

Space Filling Grids

We default to space filling designs in tune_grid()

?tune_grid():

If no tuning grid is provided, a grid (via dials::grid_space_filling()) is created with 10 candidate parameter combinations.

Iterative Search

Iterative Search Idea

Instead of pre-defining a grid of candidate points, we can model our current results to predict what the next candidate point should be.

Gaussian Process - idea

The GP model can take candidate tuning parameter combinations as inputs and make predictions for performance (e.g. Brier, ROC AUC, RMSE, etc.)

The mean performance
The variance of performance

The predicted variance is zero at locations of actual data points and becomes very high when far away from any observed data.

Gaussian Process - Balance

New candidates are picked to balance

exploring the space far away
selecting points near existing values

Gaussian Process - loop

Once we pick the candidate point, we measure performance for it (e.g. resampling).

Another GP is fit, new candidates are computed

We stop when we have completed the allowed number of iterations or if we don’t see any improvement after a pre-set number of attempts.

Gaussian Process in tidymodels

Implemented via tune_bayes()

tune_res <- tune_grid(wf_spec, resamples = ames_folds, grid = grid)

bayes_res <- tune_bayes(
  wf_spec, 
  resamples = ames_folds, 
  initial = tune_res,
  iter = 25
)

More about Iterative Search

There are a lot of other iterative methods that you can use.

The {finetune} package also has functions for simulated annealing search.

Select the model

Seeing the results

You have fitted all the models you wanted we can take a look at the performance with autoplot() and collect_metrics()

We can also use show_best() for convenience

show_best(tune_res, metric = "rmse")

# A tibble: 5 × 7
  neighbors .metric .estimator   mean     n std_err .config        
      <dbl> <chr>   <chr>       <dbl> <int>   <dbl> <chr>          
1        19 rmse    standard   36748.     5    753. pre0_mod4_post0
2        11 rmse    standard   36857.     5    727. pre0_mod3_post0
3        55 rmse    standard   38390.     5    746. pre0_mod5_post0
4         5 rmse    standard   38685.     5    573. pre0_mod2_post0
5         3 rmse    standard   41220.     5    670. pre0_mod1_post0

Selecting the best

We can use select_best() to pick the most performant model

select_best(tune_res, metric = "rmse")

# A tibble: 1 × 2
  neighbors .config        
      <dbl> <chr>          
1        19 pre0_mod4_post0

but remember how this performing hyper parameter set performanced very similarly to the other choices

Selecting the best under constraints

select_by_pct_loss() uses the “one-standard error rule” (Breiman _el at, 1984) that selects the most simple model that is within one standard error of the numerically optimal results.

select_by_pct_loss() selects the most simple model whose loss of performance is within some acceptable limit.

select_by_pct_loss(tune_res, neighbors, metric = "rmse")

# A tibble: 1 × 2
  neighbors .config        
      <dbl> <chr>          
1        11 pre0_mod3_post0

Sub-Model Trick

Some engines allow you to fit a single model and pretend to predict from a model with different hyperparameters.

A boosted tree model fit with 1000 trees can pretend to predict with 500 or 100 trees.

Same for other tree and regularized models

Parallel

Running in parallel

Grid search, combined with resampling, requires fitting a lot of models!
These models don’t depend on one another and can be run in parallel.

We can use the future or mirai packages to do this:

cores <- parallelly::availableCores(logical = FALSE)

library(future)
plan(multisession, workers = cores)

# Now call `tune_grid()`!

library(mirai)
daemons(cores)

# Now call `tune_grid()`!

Racing

Model Racing

Racing is an old tool that we can use to go even faster.

Evaluate all of the candidate models, but only for a few resamples.
Determine which candidates have a low probability of being selected.
Eliminate poor candidates.
Repeat with next resample (until no more resamples remain).
This can result in fitting a small number of models.

It is not an iterative search; it is an adaptive grid search.

Discarding Candidates

How do we eliminate tuning parameter combinations?

There are a few methods to do so. We’ll use one based on analysis of variance (ANOVA).

However… there is typically a large resampling effect in the results.

Are Candidates Different?

One way to evaluate these models is to do a paired t-test

or a t-test on their differences matched by resamples With n = 10 resamples, the confidence interval for the difference in the model error is (0.99, 2.8), indicating that candidate number 2 has a smaller error.

Racing in tidymodels

tune_race_anova() has a very similar interface as tune_grid()

library(finetune)
tune_res <- tune_race_anova(
  wf_spec,
  resamples = ames_folds,
  grid = 50
)

Thanks!!

More Information