12 Iterative Search

This book chapter discusses several search procedures for finding optimal (or at least acceptable) tuning parameter values.

12.1 Requirements

This chapter requires 6 packages (finetune, future.mirai, GA, probably, tidymodels, xgboost). To install:

req_pkg <- c("finetune", "GA", "probably", "tidymodels", "xgboost")

# Check to see if they are installed: 
pkg_installed <- vapply(req_pkg, rlang::is_installed, logical(1))

# Install missing packages: 
if ( any(!pkg_installed) ) {
  install_list <- names(pkg_installed)[!pkg_installed]
  pak::pak(install_list)
}

Let’s load the packages and set some preferences:

library(GA)
library(tidymodels)
library(finetune)
library(probably)
library(future.mirai)

tidymodels_prefer()
theme_set(theme_bw())
plan(mirai_multisession)

To reduce the complexity of the example, we’ll use a simulated classification data set containing numeric predictors. We’ll simulate 1,000 samples using a simulation system, the details of which can be found in the modeldata documentation. The data set has linear, nonlinear, and interacting features, and the classes are fairly balanced. We’ll use a 3:1 split for training and testing as well as 10-fold cross-validation:

set.seed(2783)
sim_dat <- sim_classification(1000)

set.seed(101)
sim_split <- initial_split(sim_dat)
sim_train <- training(sim_split)
sim_test <- testing(sim_split)
sim_rs <- vfold_cv(sim_train)

We’ll tune a boosted classification model using the xgboost package, described in a later chapter. We tune multiple parameters and set an additional parameter, validation, to be zero. This is used when early stopping, which we will not use:

bst_spec <-
  boost_tree(
    mtry = tune(),
    tree_depth = tune(),
    trees = tune(),
    learn_rate = tune(),
    min_n = tune(),
    loss_reduction = tune(),
    sample_size = tune()
  ) %>%
  set_engine('xgboost', validation = 0) %>%
  set_mode('classification')

Tree-based models require little to no preprocessing so we will use a simple R formula to define the roles of variables:

bst_wflow <- workflow(class ~ ., bst_spec)

From the workflow, we create a parameters object and set the ranges for two parameters. mtry requires an upper bound to be set since it depends on the number of model terms in the data set. We’ll need parameter information since most iterative methods need to know the possible ranges as well as the type of parameter (e.g., integer, character, etc.) and/or any transformations of the values.

bst_param <-
  bst_wflow %>%
  extract_parameter_set_dials() %>%
  update(
    mtry = mtry(c(3, 15)),
    trees = trees(c(10, 500))
  )

We can now fit and/or tune models. We’ll declare what metrics should be collected and then create a small space-filling design that is used as the starting point for simulated annealing and Bayesian optimization. We could let the system make these initial values for us, but we’ll create them now so that we can reuse the results and have a common starting place.

cls_mtr <- metric_set(brier_class, roc_auc)

init_grid <- grid_space_filling(bst_param, size = 6)

set.seed(21)
initial_res <-
  bst_wflow %>%
  tune_grid(
    resamples = sim_rs,
    grid = init_grid,
    metrics = cls_mtr,
    control = control_grid(save_pred = TRUE)
  )

From these six candidates, the smallest Brier score was 0.117, a mediocre value:

show_best(initial_res, metric = "brier_class") %>% 
  select(-.estimator, -.config, -.metric) %>% 
  relocate(mean)
#> # A tibble: 5 × 10
#>     mean  mtry trees min_n tree_depth learn_rate loss_reduction sample_size     n  std_err
#>    <dbl> <int> <int> <int>      <int>      <dbl>          <dbl>       <dbl> <int>    <dbl>
#> 1 0.1172    12   402    17          1   0.1           1    e-10        0.46    10 0.005269
#> 2 0.1201    15   304    24          9   0.03162       3.162e+ 1        1       10 0.004319
#> 3 0.1504     5   206    32         15   0.3162        1.995e- 8        0.64    10 0.006183
#> 4 0.1786     3   108     2          3   0.01          7.943e- 4        0.82    10 0.002977
#> 5 0.2441     7   500     9         12   0.003162      1.585e- 1        0.1     10 0.001748

We will show how to use three iterative search methods.

12.2 Simulated Annealing

The finetune package contains finetune::tune_sim_anneal() that can incrementally search the parameter space in a non-greedy way. Its syntax is very similar to tune_grid() with two additional arguments of note:

initial: Either:
- An integer that declares how many points in a space-filling design should be created and evaluated before proceeding.
- An object from a previous run of tune_grid() or one of the other tune_*() functions.
iter: An integer for the maximum search iterations.

Also of note is control_sim_anneal(), which helps save additional results and controls logging, and if restarts or early stopping should be used.

One important note: the first metric in the metric set guides the optimization. All of the other metric values are recorded for each iteration but only one is used to improve the model fit.

Here’s some example code:

set.seed(381)
sa_res <-
  bst_wflow %>%
  tune_sim_anneal(
    resamples = sim_rs,
    param_info = bst_param,
    metrics = cls_mtr,
    # Additional options:
    initial = initial_res,
    iter = 50,
    # Prevent early stopping, save out-of-sample predictions, 
    # and log the process to the console: 
    control = control_sim_anneal(
      no_improve = Inf,
      verbose_iter = TRUE,
      save_pred = TRUE
    )
  )
#> Optimizing brier_class
#> Initial best: 0.11724
#> 1 ◯ accept suboptimal  brier_class=0.12584 (+/-0.005567)
#> 2 ♥ new best           brier_class=0.11088 (+/-0.006034)
#> 3 ◯ accept suboptimal  brier_class=0.12186 (+/-0.006986)
#> 4 + better suboptimal  brier_class=0.11355 (+/-0.0077)
#> 5 + better suboptimal  brier_class=0.11204 (+/-0.005044)
#> 6 ─ discard suboptimal brier_class=0.18772 (+/-0.006055)
#> 7 ─ discard suboptimal brier_class=0.14709 (+/-0.004402)
#> 8 ─ discard suboptimal brier_class=0.15341 (+/-0.006266)
#> 9 ♥ new best           brier_class=0.10271 (+/-0.005368)
#> 10 ♥ new best           brier_class=0.098399 (+/-0.005912)
#> 11 ─ discard suboptimal brier_class=0.11041 (+/-0.006706)
#> 12 ♥ new best           brier_class=0.089953 (+/-0.004929)
#> 13 ─ discard suboptimal brier_class=0.094707 (+/-0.005409)
#> 14 ─ discard suboptimal brier_class=0.10833 (+/-0.006363)
#> 15 ♥ new best           brier_class=0.088551 (+/-0.005869)
#> 16 ◯ accept suboptimal  brier_class=0.092395 (+/-0.006208)
#> 17 + better suboptimal  brier_class=0.091379 (+/-0.006153)
#> 18 ♥ new best           brier_class=0.080126 (+/-0.005732)
#> 19 ♥ new best           brier_class=0.078878 (+/-0.005375)
#> 20 ─ discard suboptimal brier_class=0.088738 (+/-0.00475)
#> 21 ─ discard suboptimal brier_class=0.088662 (+/-0.004205)
#> 22 ◯ accept suboptimal  brier_class=0.079888 (+/-0.005168)
#> 23 ─ discard suboptimal brier_class=0.083144 (+/-0.004481)
#> 24 ◯ accept suboptimal  brier_class=0.080998 (+/-0.005145)
#> 25 ─ discard suboptimal brier_class=0.089208 (+/-0.00424)
#> 26 ─ discard suboptimal brier_class=0.083758 (+/-0.004778)
#> 27 ✖ restart from best  brier_class=0.080229 (+/-0.005623)
#> 28 ─ discard suboptimal brier_class=0.07948 (+/-0.005617)
#> 29 ─ discard suboptimal brier_class=0.088518 (+/-0.004385)
#> 30 ◯ accept suboptimal  brier_class=0.080155 (+/-0.004802)
#> 31 ─ discard suboptimal brier_class=0.085765 (+/-0.004087)
#> 32 ─ discard suboptimal brier_class=0.09274 (+/-0.004826)
#> 33 ◯ accept suboptimal  brier_class=0.082697 (+/-0.004478)
#> 34 ─ discard suboptimal brier_class=0.1074 (+/-0.004515)
#> 35 ✖ restart from best  brier_class=0.095289 (+/-0.004477)
#> 36 ─ discard suboptimal brier_class=0.08412 (+/-0.005034)
#> 37 ─ discard suboptimal brier_class=0.082724 (+/-0.004939)
#> 38 ─ discard suboptimal brier_class=0.081199 (+/-0.006138)
#> 39 ─ discard suboptimal brier_class=0.084084 (+/-0.005246)
#> 40 ─ discard suboptimal brier_class=0.082772 (+/-0.005205)
#> 41 ─ discard suboptimal brier_class=0.082996 (+/-0.004645)
#> 42 ─ discard suboptimal brier_class=0.081317 (+/-0.004638)
#> 43 ✖ restart from best  brier_class=0.084592 (+/-0.004923)
#> 44 ─ discard suboptimal brier_class=0.085653 (+/-0.004442)
#> 45 ─ discard suboptimal brier_class=0.085816 (+/-0.004273)
#> 46 ─ discard suboptimal brier_class=0.085154 (+/-0.004607)
#> 47 ◯ accept suboptimal  brier_class=0.084356 (+/-0.004371)
#> 48 ─ discard suboptimal brier_class=0.10018 (+/-0.004477)
#> 49 + better suboptimal  brier_class=0.080555 (+/-0.005406)
#> 50 ─ discard suboptimal brier_class=0.084699 (+/-0.004644)

The Brier score has been reduced from the initial value of 0.117 to a new best of 0.079. We’ll estimate:

show_best(sa_res, metric = "brier_class") %>% 
  select(-.estimator, -.config, -.metric) %>% 
  relocate(mean)
#> # A tibble: 5 × 11
#>      mean  mtry trees min_n tree_depth learn_rate loss_reduction sample_size     n
#>     <dbl> <int> <int> <int>      <int>      <dbl>          <dbl>       <dbl> <int>
#> 1 0.07888    12   243     2          6    0.02247   0.000001299       0.3711    10
#> 2 0.07948    11   243     3          8    0.02518   0.0000002770      0.4750    10
#> 3 0.07989    13   228     2          5    0.01815   0.000009236       0.3659    10
#> 4 0.08013    11   291     2          6    0.04033   0.000001875       0.4259    10
#> 5 0.08015    14   278     2          4    0.01628   0.000005399       0.3251    10
#> # ℹ 2 more variables: std_err <dbl>, .iter <int>

There are several ways to use autoplot() to investigate the results. The default methods plots the metric(s) versus the parameters. Here is it for just the Brier score:

autoplot(sa_res, metric = "brier_class")

Next, we can see how the parameter values change over the search by adding type = "parameters":

autoplot(sa_res, metric = "brier_class", type = "parameters")

Finally, a plot of performance metrics can be used via type = "performance":

autoplot(sa_res, metric = "brier_class", type = "performance")

If we had used control_sim_anneal(save_worflow = TRUE), we could use fit_best() to determine the candidate with the best metric value and then fit that model to the training set.

12.3 Genetic Algorithms

tidymodels has no API or function for optimizing models using genetic algorithms. However, there is unsupported code (below) for doing this as long as the tuning parameters are all numeric. We’ll use the GA package for the computations, and this will require:

The upper and lower bounds of the parameters
Code to transform the parameter values (if needed)
A means to resample/evaluate a model on an out-of-sample data set.
A method to compute a single performance metric such that larger values are more desirable.

To get started, let’s work with the parameter object named bst_param. We can use purrr::map_dbl() to get vectors of the minimum and maximum values. These should be in the transformed space (if needed):

min_vals <- map_dbl(bst_param$object, ~ .x$range[[1]])
max_vals <- map_dbl(bst_param$object, ~ .x$range[[2]])

The remainder of the tasks should occur within the GA’s processing. This function shows code with comments to help understand:

yardstick_fitness <- function(values, wflow, param_info, metrics, ...) {
  # Quietly load required packages if run in parallel
  shhh <- purrr::quietly(require)
  loaded <- lapply(c("tidymodels", required_pkgs(wflow)), shhh)

  info <- as_tibble(metrics)

  # Check to see if there are any qualitative parameters and stop if so.
  qual_check <- map_lgl(param_info$object, ~ inherits(.x, "qual_param"))
  if (any(qual_check)) {
    cli::cli_abort(
      "The function only works for quantitative tuning parameters."
    )
  }

  # Back-transform parameters if they use a transformation (inputs are in
  # transformed scales)
  values <- purrr::map2_dbl(
    values,
    param_info$object,
    ~ dials::value_inverse(.y, .x)
  )

  # Convert integer parameters to integers
  is_int <- map_lgl(param_info$object, ~ .x$type == "integer")
  int_param <- param_info$id[is_int]
  for (i in int_param) {
    ind <- which(param_info$id == i)
    values[[ind]] <- floor(values[[ind]])
  }

  # Convert from vector to a tibble
  values <- matrix(values, nrow = 1)
  colnames(values) <- param_info$id
  values <- as_tibble(values)

  # We could run _populations_ within a generation in parallel. If we do,
  # let's make sure to turn off parallelization of resamples here:
  # ctrl <- control_grid(allow_par = FALSE)
  
  ctrl <- control_grid()

  # Resample / validate metrics
  res <- tune_grid(
    wflow,
    metrics = metrics,
    param_info = param_info,
    grid = values,
    control = ctrl,
    ...
  )

  # Fitness is to be maximized so change direction if needed
  best_res <- show_best(res, metric = info$metric[1])
  if (info$direction[1] == "minimize") {
    obj_value <- -best_res$mean
  } else {
    obj_value <- best_res$mean
  }
  obj_value
}

Now, let’s initialize the search using a space-filling design (with 10 candidates per population):

pop_size <- 10
grid_ga <- grid_space_filling(bst_param, size = pop_size, original = FALSE)

# We apply the GA operators on the transformed scale of the parameters (if any).
# For this example, two use a log-transform: 
grid_ga$learn_rate <- log10(grid_ga$learn_rate)
grid_ga$loss_reduction <- log10(grid_ga$loss_reduction)

Now we can run GA::ga() to begin the process:

set.seed(158)
ga_res <-
  ga(
    # ga() options:
    type = "real-valued",
    fitness = yardstick_fitness,
    lower = min_vals,
    upper = max_vals,
    popSize = pop_size,
    suggestions = as.matrix(grid_ga),
    maxiter = 25,
    # Save the best solutions at each iteration
    keepBest = TRUE,
    seed = 39,
    # Here we can signal to run _populations_ within a generation in parallel
    parallel = FALSE,
    # Now options to pass to the `...` in yardstick_fitness()
    wflow = bst_wflow,
    param_info = bst_param,
    metrics = cls_mtr,
    resamples = sim_rs
  )

Here is a plot of the best results per population and the mean result (both are Brier scores):

# Negate the fitness value since the Brier score should be minimized.
-attr(ga_res,"summary") %>% 
  as_tibble() %>% 
  mutate(generation = row_number()) %>% 
  select(best = max, mean = mean, generation) %>% 
  pivot_longer(c(best, mean), names_to = "summary", values_to = "fitness") %>% 
  ggplot(aes(generation, fitness, col = summary, pch = summary)) + 
  geom_point() + 
  labs(x = "Generation", y = "Brier Score (CV)")

The best results are in a slot called solution. Let’s remap that to the original parameter values:

ga_best <- 
  # There could be multiple solutions for the same fitness; we take the first. 
  ga_res@solution[1,] %>% 
  # Back-transform
  map2(bst_param$object, ~ value_inverse(.y, .x)) %>% 
  as_tibble() %>% 
  set_names(bst_param$id) %>% 
  # Attach fitness and coerce to integer if needed.
  mutate(
    mtry = floor(mtry),
    trees = floor(trees),
    min_n = floor(min_n),
    tree_depth = floor(tree_depth),
    brier = -ga_res@fitnessValue
  ) %>% 
  relocate(brier)

ga_best
#> # A tibble: 1 × 8
#>     brier  mtry trees min_n tree_depth learn_rate loss_reduction sample_size
#>     <dbl> <dbl> <dbl> <dbl>      <dbl>      <dbl>          <dbl>       <dbl>
#> 1 0.07793     9   410     2          6    0.01235   0.0000001197      0.6159

12.4 Bayesian Optimization

Numerous packages use Bayesian optimization:

and many others. The book Gaussian process modeling, design and optimization for the applied sciences also contains descriptions of many other GO packages.

Currently, tidymodels uses GPfit.

The tune package contains tune_bayes() for Bayesian optimization. The syntax is identical to what we’ve already seen with tune_sim_anneal().

set.seed(221)
bo_res <- bst_wflow %>%
  tune_bayes(
    resamples = sim_rs,
    param_info = bst_param,
    metrics = cls_mtr,
    # These options work as before: 
    initial = initial_res,
    iter = 50,
    control = control_bayes(
      no_improve = Inf,
      verbose_iter = TRUE,
      save_pred = TRUE,
    )
  )
#> ! There are 7 tuning parameters and 6 grid points were requested.
#> • There are more tuning parameters than there are initial points. This is likely to
#>   cause numerical issues in the first few search iterations.
#> Optimizing brier_class using the expected improvement
#> 
#> ── Iteration 1 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1172 (@iter 0)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 7 features but only has 6 data points
#>   to do so. This may cause errors or a poor model fit.
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=5, trees=489, min_n=26, tree_depth=14, learn_rate=0.0582,
#>   loss_reduction=9.86e-09, sample_size=0.114
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.2499 (+/-0.000157)
#> 
#> ── Iteration 2 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1172 (@iter 0)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 7 features but only has 7 data points
#>   to do so. This may cause errors or a poor model fit.
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=96, min_n=39, tree_depth=14, learn_rate=0.0768,
#>   loss_reduction=2.53e-07, sample_size=0.513
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.2189 (+/-0.00405)
#> 
#> ── Iteration 3 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1172 (@iter 0)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 7 features but only has 8 data points
#>   to do so. This may cause errors or a poor model fit.
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=5, trees=286, min_n=7, tree_depth=1, learn_rate=0.0278, loss_reduction=4.26e-05,
#>   sample_size=0.432
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results:    brier_class=0.1153 (+/-0.00444)
#> 
#> ── Iteration 4 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=338, min_n=24, tree_depth=1, learn_rate=0.00178, loss_reduction=0.108,
#>   sample_size=0.702
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.2166 (+/-0.00133)
#> 
#> ── Iteration 5 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=384, min_n=28, tree_depth=11, learn_rate=0.254,
#>   loss_reduction=0.000924, sample_size=0.936
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1163 (+/-0.00601)
#> 
#> ── Iteration 6 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=208, min_n=30, tree_depth=15, learn_rate=0.0206, loss_reduction=3.24,
#>   sample_size=0.81
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1245 (+/-0.00436)
#> 
#> ── Iteration 7 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=229, min_n=6, tree_depth=15, learn_rate=0.266, loss_reduction=2.85,
#>   sample_size=0.982
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results:    brier_class=0.08313 (+/-0.00674)
#> 
#> ── Iteration 8 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.08313 (@iter 7)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=296, min_n=12, tree_depth=15, learn_rate=0.176,
#>   loss_reduction=9.48e-08, sample_size=0.999
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09177 (+/-0.00532)
#> 
#> ── Iteration 9 ──────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.08313 (@iter 7)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=199, min_n=3, tree_depth=14, learn_rate=0.0622, loss_reduction=0.125,
#>   sample_size=0.959
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results:    brier_class=0.0824 (+/-0.00654)
#> 
#> ── Iteration 10 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.0824 (@iter 9)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=48, min_n=6, tree_depth=15, learn_rate=0.0012, loss_reduction=9.11e-10,
#>   sample_size=0.934
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.2413 (+/-0.000398)
#> 
#> ── Iteration 11 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.0824 (@iter 9)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=255, min_n=4, tree_depth=8, learn_rate=0.164, loss_reduction=0.00354,
#>   sample_size=0.352
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.0903 (+/-0.00684)
#> 
#> ── Iteration 12 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.0824 (@iter 9)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=14, trees=205, min_n=3, tree_depth=14, learn_rate=0.0699,
#>   loss_reduction=7.86e-05, sample_size=0.977
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results:    brier_class=0.08224 (+/-0.00684)
#> 
#> ── Iteration 13 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.08224 (@iter 12)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=8, trees=357, min_n=5, tree_depth=5, learn_rate=0.308, loss_reduction=4.44e-10,
#>   sample_size=0.385
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1102 (+/-0.00737)
#> 
#> ── Iteration 14 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.08224 (@iter 12)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=220, min_n=9, tree_depth=2, learn_rate=0.113, loss_reduction=5.26,
#>   sample_size=0.405
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09564 (+/-0.00507)
#> 
#> ── Iteration 15 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.08224 (@iter 12)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=249, min_n=4, tree_depth=8, learn_rate=0.134, loss_reduction=1.86,
#>   sample_size=0.854
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results:    brier_class=0.07948 (+/-0.00598)
#> 
#> ── Iteration 16 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=249, min_n=3, tree_depth=6, learn_rate=0.0989,
#>   loss_reduction=2.55e-10, sample_size=0.896
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08295 (+/-0.00608)
#> 
#> ── Iteration 17 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=244, min_n=4, tree_depth=6, learn_rate=0.0662, loss_reduction=13.2,
#>   sample_size=0.976
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08762 (+/-0.00476)
#> 
#> ── Iteration 18 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=238, min_n=3, tree_depth=1, learn_rate=0.166, loss_reduction=9.55e-07,
#>   sample_size=0.966
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08199 (+/-0.00546)
#> 
#> ── Iteration 19 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=6, trees=239, min_n=5, tree_depth=14, learn_rate=0.188, loss_reduction=3.6e-09,
#>   sample_size=0.975
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08804 (+/-0.00716)
#> 
#> ── Iteration 20 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=205, min_n=3, tree_depth=2, learn_rate=0.118, loss_reduction=0.37,
#>   sample_size=0.845
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08471 (+/-0.00557)
#> 
#> ── Iteration 21 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=326, min_n=2, tree_depth=14, learn_rate=0.107, loss_reduction=0.348,
#>   sample_size=0.822
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08266 (+/-0.00683)
#> 
#> ── Iteration 22 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=285, min_n=4, tree_depth=2, learn_rate=0.164, loss_reduction=7.6,
#>   sample_size=0.966
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08542 (+/-0.0049)
#> 
#> ── Iteration 23 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=251, min_n=3, tree_depth=2, learn_rate=0.0766, loss_reduction=7.89e-09,
#>   sample_size=0.919
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08063 (+/-0.00555)
#> 
#> ── Iteration 24 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=8, trees=213, min_n=2, tree_depth=14, learn_rate=0.128, loss_reduction=2.15e-10,
#>   sample_size=0.831
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08654 (+/-0.00632)
#> 
#> ── Iteration 25 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=357, min_n=4, tree_depth=8, learn_rate=0.177, loss_reduction=0.00166,
#>   sample_size=0.901
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08988 (+/-0.00743)
#> 
#> ── Iteration 26 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=203, min_n=2, tree_depth=14, learn_rate=0.183,
#>   loss_reduction=3.94e-08, sample_size=0.922
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08778 (+/-0.00755)
#> 
#> ── Iteration 27 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=342, min_n=4, tree_depth=2, learn_rate=0.0774, loss_reduction=1.98,
#>   sample_size=0.748
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08631 (+/-0.00513)
#> 
#> ── Iteration 28 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=327, min_n=7, tree_depth=14, learn_rate=0.103,
#>   loss_reduction=5.45e-10, sample_size=0.979
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08946 (+/-0.00573)
#> 
#> ── Iteration 29 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=14, trees=266, min_n=2, tree_depth=6, learn_rate=0.136, loss_reduction=0.000997,
#>   sample_size=0.853
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08579 (+/-0.00728)
#> 
#> ── Iteration 30 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=6, trees=317, min_n=9, tree_depth=5, learn_rate=0.13, loss_reduction=1.5,
#>   sample_size=0.914
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08558 (+/-0.00593)
#> 
#> ── Iteration 31 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=356, min_n=21, tree_depth=7, learn_rate=0.145, loss_reduction=0.122,
#>   sample_size=0.222
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.2315 (+/-0.0046)
#> 
#> ── Iteration 32 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=29, min_n=5, tree_depth=13, learn_rate=0.13, loss_reduction=18.1,
#>   sample_size=0.672
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1201 (+/-0.00443)
#> 
#> ── Iteration 33 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=4, trees=394, min_n=15, tree_depth=8, learn_rate=0.0998,
#>   loss_reduction=5.21e-10, sample_size=0.804
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.0944 (+/-0.00564)
#> 
#> ── Iteration 34 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=460, min_n=3, tree_depth=13, learn_rate=0.0819, loss_reduction=0.113,
#>   sample_size=0.464
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08671 (+/-0.00593)
#> 
#> ── Iteration 35 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=290, min_n=4, tree_depth=15, learn_rate=0.137, loss_reduction=14,
#>   sample_size=0.542
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09203 (+/-0.00477)
#> 
#> ── Iteration 36 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=14, trees=99, min_n=15, tree_depth=12, learn_rate=0.11, loss_reduction=2.86e-09,
#>   sample_size=0.967
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08659 (+/-0.00509)
#> 
#> ── Iteration 37 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=268, min_n=3, tree_depth=6, learn_rate=0.109, loss_reduction=0.297,
#>   sample_size=0.437
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08782 (+/-0.00678)
#> 
#> ── Iteration 38 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=34, min_n=2, tree_depth=1, learn_rate=0.143, loss_reduction=0.0201,
#>   sample_size=0.995
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1141 (+/-0.00432)
#> 
#> ── Iteration 39 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=225, min_n=13, tree_depth=11, learn_rate=0.0965,
#>   loss_reduction=3.44e-06, sample_size=0.925
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08963 (+/-0.00498)
#> 
#> ── Iteration 40 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=17, min_n=2, tree_depth=13, learn_rate=0.146, loss_reduction=0.026,
#>   sample_size=0.359
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09176 (+/-0.00485)
#> 
#> ── Iteration 41 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=157, min_n=17, tree_depth=11, learn_rate=0.314,
#>   loss_reduction=6.95e-08, sample_size=0.957
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09725 (+/-0.00662)
#> 
#> ── Iteration 42 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=313, min_n=2, tree_depth=6, learn_rate=0.127, loss_reduction=0.109,
#>   sample_size=0.813
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09222 (+/-0.00767)
#> 
#> ── Iteration 43 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=493, min_n=12, tree_depth=13, learn_rate=0.0622,
#>   loss_reduction=0.000261, sample_size=0.637
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.094 (+/-0.00595)
#> 
#> ── Iteration 44 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=18, min_n=32, tree_depth=13, learn_rate=0.089, loss_reduction=8.1e-09,
#>   sample_size=0.995
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1396 (+/-0.00449)
#> 
#> ── Iteration 45 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=187, min_n=9, tree_depth=15, learn_rate=0.114, loss_reduction=0.00238,
#>   sample_size=0.868
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08754 (+/-0.00559)
#> 
#> ── Iteration 46 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=11, min_n=3, tree_depth=4, learn_rate=0.197, loss_reduction=3.75e-10,
#>   sample_size=0.137
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1147 (+/-0.00564)
#> 
#> ── Iteration 47 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=432, min_n=6, tree_depth=9, learn_rate=0.0675, loss_reduction=6.32e-09,
#>   sample_size=0.521
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08992 (+/-0.00517)
#> 
#> ── Iteration 48 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=443, min_n=4, tree_depth=15, learn_rate=0.0621, loss_reduction=0.682,
#>   sample_size=0.838
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.08458 (+/-0.0066)
#> 
#> ── Iteration 49 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=8, trees=482, min_n=9, tree_depth=2, learn_rate=0.0937, loss_reduction=0.00332,
#>   sample_size=0.534
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.09804 (+/-0.00584)
#> 
#> ── Iteration 50 ─────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=71, min_n=21, tree_depth=3, learn_rate=0.064, loss_reduction=1.1e-10,
#>   sample_size=0.858
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results:    brier_class=0.1003 (+/-0.00459)

The same helper functions are used to interrogate the results and to create diagnostic plots:

show_best(bo_res, metric = "brier_class") %>% 
  select(-.estimator, -.config, -.metric) %>% 
  relocate(mean)
#> # A tibble: 5 × 11
#>      mean  mtry trees min_n tree_depth learn_rate loss_reduction sample_size     n
#>     <dbl> <int> <int> <int>      <int>      <dbl>          <dbl>       <dbl> <int>
#> 1 0.07948     7   249     4          8    0.1343        1.859e+0      0.8543    10
#> 2 0.08063     7   251     3          2    0.07663       7.895e-9      0.9194    10
#> 3 0.08199    12   238     3          1    0.1658        9.551e-7      0.9661    10
#> 4 0.08224    14   205     3         14    0.06994       7.861e-5      0.9767    10
#> 5 0.08240    10   199     3         14    0.06220       1.247e-1      0.9593    10
#> # ℹ 2 more variables: std_err <dbl>, .iter <int>

These results are about the same as those of the SA search. We can plot the data and see that some parameters (number of trees, learning rate, minimum node size, and the sampling proportion) appear to converge to specific values:

autoplot(bo_res, metric = "brier_class")

Here we see that the learning rate and the minumum node size reach a steady-state:

autoplot(bo_res, metric = "brier_class", type = "parameters")

A plot of the overall progress:

autoplot(bo_res, metric = "brier_class", type = "performance")

Other packages use Bayesian optimization: