req_pkg <- c("finetune", "GA", "probably", "tidymodels", "xgboost")
# Check to see if they are installed:
pkg_installed <- vapply(req_pkg, rlang::is_installed, logical(1))
# Install missing packages:
if ( any(!pkg_installed) ) {
install_list <- names(pkg_installed)[!pkg_installed]
pak::pak(install_list)
}
12 Iterative Search
This book chapter discusses several search procedures for finding optimal (or at least acceptable) tuning parameter values.
12.1 Requirements
This chapter requires 6 packages (finetune, future.mirai, GA, probably, tidymodels, xgboost). To install:
Let’s load the packages and set some preferences:
library(GA)
library(tidymodels)
library(finetune)
library(probably)
library(future.mirai)
tidymodels_prefer()
theme_set(theme_bw())
plan(mirai_multisession)
To reduce the complexity of the example, we’ll use a simulated classification data set containing numeric predictors. We’ll simulate 1,000 samples using a simulation system, the details of which can be found in the modeldata documentation. The data set has linear, nonlinear, and interacting features, and the classes are fairly balanced. We’ll use a 3:1 split for training and testing as well as 10-fold cross-validation:
We’ll tune a boosted classification model using the xgboost package, described in a later chapter. We tune multiple parameters and set an additional parameter, validation
, to be zero. This is used when early stopping, which we will not use:
Tree-based models require little to no preprocessing so we will use a simple R formula to define the roles of variables:
bst_wflow <- workflow(class ~ ., bst_spec)
From the workflow, we create a parameters object and set the ranges for two parameters. mtry
requires an upper bound to be set since it depends on the number of model terms in the data set. We’ll need parameter information since most iterative methods need to know the possible ranges as well as the type of parameter (e.g., integer, character, etc.) and/or any transformations of the values.
bst_param <-
bst_wflow %>%
extract_parameter_set_dials() %>%
update(
mtry = mtry(c(3, 15)),
trees = trees(c(10, 500))
)
We can now fit and/or tune models. We’ll declare what metrics should be collected and then create a small space-filling design that is used as the starting point for simulated annealing and Bayesian optimization. We could let the system make these initial values for us, but we’ll create them now so that we can reuse the results and have a common starting place.
cls_mtr <- metric_set(brier_class, roc_auc)
init_grid <- grid_space_filling(bst_param, size = 6)
set.seed(21)
initial_res <-
bst_wflow %>%
tune_grid(
resamples = sim_rs,
grid = init_grid,
metrics = cls_mtr,
control = control_grid(save_pred = TRUE)
)
From these six candidates, the smallest Brier score was 0.117, a mediocre value:
show_best(initial_res, metric = "brier_class") %>%
select(-.estimator, -.config, -.metric) %>%
relocate(mean)
#> # A tibble: 5 × 10
#> mean mtry trees min_n tree_depth learn_rate loss_reduction sample_size n std_err
#> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 0.1172 12 402 17 1 0.1 1 e-10 0.46 10 0.005269
#> 2 0.1201 15 304 24 9 0.03162 3.162e+ 1 1 10 0.004319
#> 3 0.1504 5 206 32 15 0.3162 1.995e- 8 0.64 10 0.006183
#> 4 0.1786 3 108 2 3 0.01 7.943e- 4 0.82 10 0.002977
#> 5 0.2441 7 500 9 12 0.003162 1.585e- 1 0.1 10 0.001748
We will show how to use three iterative search methods.
12.2 Simulated Annealing
The finetune package contains finetune::tune_sim_anneal()
that can incrementally search the parameter space in a non-greedy way. Its syntax is very similar to tune_grid()
with two additional arguments of note:
-
initial
: Either:- An integer that declares how many points in a space-filling design should be created and evaluated before proceeding.
- An object from a previous run of
tune_grid()
or one of the othertune_*()
functions.
-
iter
: An integer for the maximum search iterations.
Also of note is control_sim_anneal()
, which helps save additional results and controls logging, and if restarts or early stopping should be used.
One important note: the first metric in the metric set guides the optimization. All of the other metric values are recorded for each iteration but only one is used to improve the model fit.
Here’s some example code:
set.seed(381)
sa_res <-
bst_wflow %>%
tune_sim_anneal(
resamples = sim_rs,
param_info = bst_param,
metrics = cls_mtr,
# Additional options:
initial = initial_res,
iter = 50,
# Prevent early stopping, save out-of-sample predictions,
# and log the process to the console:
control = control_sim_anneal(
no_improve = Inf,
verbose_iter = TRUE,
save_pred = TRUE
)
)
#> Optimizing brier_class
#> Initial best: 0.11724
#> 1 ◯ accept suboptimal brier_class=0.12584 (+/-0.005567)
#> 2 ♥ new best brier_class=0.11088 (+/-0.006034)
#> 3 ◯ accept suboptimal brier_class=0.12186 (+/-0.006986)
#> 4 + better suboptimal brier_class=0.11355 (+/-0.0077)
#> 5 + better suboptimal brier_class=0.11204 (+/-0.005044)
#> 6 ─ discard suboptimal brier_class=0.18772 (+/-0.006055)
#> 7 ─ discard suboptimal brier_class=0.14709 (+/-0.004402)
#> 8 ─ discard suboptimal brier_class=0.15341 (+/-0.006266)
#> 9 ♥ new best brier_class=0.10271 (+/-0.005368)
#> 10 ♥ new best brier_class=0.098399 (+/-0.005912)
#> 11 ─ discard suboptimal brier_class=0.11041 (+/-0.006706)
#> 12 ♥ new best brier_class=0.089953 (+/-0.004929)
#> 13 ─ discard suboptimal brier_class=0.094707 (+/-0.005409)
#> 14 ─ discard suboptimal brier_class=0.10833 (+/-0.006363)
#> 15 ♥ new best brier_class=0.088551 (+/-0.005869)
#> 16 ◯ accept suboptimal brier_class=0.092395 (+/-0.006208)
#> 17 + better suboptimal brier_class=0.091379 (+/-0.006153)
#> 18 ♥ new best brier_class=0.080126 (+/-0.005732)
#> 19 ♥ new best brier_class=0.078878 (+/-0.005375)
#> 20 ─ discard suboptimal brier_class=0.088738 (+/-0.00475)
#> 21 ─ discard suboptimal brier_class=0.088662 (+/-0.004205)
#> 22 ◯ accept suboptimal brier_class=0.079888 (+/-0.005168)
#> 23 ─ discard suboptimal brier_class=0.083144 (+/-0.004481)
#> 24 ◯ accept suboptimal brier_class=0.080998 (+/-0.005145)
#> 25 ─ discard suboptimal brier_class=0.089208 (+/-0.00424)
#> 26 ─ discard suboptimal brier_class=0.083758 (+/-0.004778)
#> 27 ✖ restart from best brier_class=0.080229 (+/-0.005623)
#> 28 ─ discard suboptimal brier_class=0.07948 (+/-0.005617)
#> 29 ─ discard suboptimal brier_class=0.088518 (+/-0.004385)
#> 30 ◯ accept suboptimal brier_class=0.080155 (+/-0.004802)
#> 31 ─ discard suboptimal brier_class=0.085765 (+/-0.004087)
#> 32 ─ discard suboptimal brier_class=0.09274 (+/-0.004826)
#> 33 ◯ accept suboptimal brier_class=0.082697 (+/-0.004478)
#> 34 ─ discard suboptimal brier_class=0.1074 (+/-0.004515)
#> 35 ✖ restart from best brier_class=0.095289 (+/-0.004477)
#> 36 ─ discard suboptimal brier_class=0.08412 (+/-0.005034)
#> 37 ─ discard suboptimal brier_class=0.082724 (+/-0.004939)
#> 38 ─ discard suboptimal brier_class=0.081199 (+/-0.006138)
#> 39 ─ discard suboptimal brier_class=0.084084 (+/-0.005246)
#> 40 ─ discard suboptimal brier_class=0.082772 (+/-0.005205)
#> 41 ─ discard suboptimal brier_class=0.082996 (+/-0.004645)
#> 42 ─ discard suboptimal brier_class=0.081317 (+/-0.004638)
#> 43 ✖ restart from best brier_class=0.084592 (+/-0.004923)
#> 44 ─ discard suboptimal brier_class=0.085653 (+/-0.004442)
#> 45 ─ discard suboptimal brier_class=0.085816 (+/-0.004273)
#> 46 ─ discard suboptimal brier_class=0.085154 (+/-0.004607)
#> 47 ◯ accept suboptimal brier_class=0.084356 (+/-0.004371)
#> 48 ─ discard suboptimal brier_class=0.10018 (+/-0.004477)
#> 49 + better suboptimal brier_class=0.080555 (+/-0.005406)
#> 50 ─ discard suboptimal brier_class=0.084699 (+/-0.004644)
The Brier score has been reduced from the initial value of 0.117 to a new best of 0.079. We’ll estimate:
show_best(sa_res, metric = "brier_class") %>%
select(-.estimator, -.config, -.metric) %>%
relocate(mean)
#> # A tibble: 5 × 11
#> mean mtry trees min_n tree_depth learn_rate loss_reduction sample_size n
#> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 0.07888 12 243 2 6 0.02247 0.000001299 0.3711 10
#> 2 0.07948 11 243 3 8 0.02518 0.0000002770 0.4750 10
#> 3 0.07989 13 228 2 5 0.01815 0.000009236 0.3659 10
#> 4 0.08013 11 291 2 6 0.04033 0.000001875 0.4259 10
#> 5 0.08015 14 278 2 4 0.01628 0.000005399 0.3251 10
#> # ℹ 2 more variables: std_err <dbl>, .iter <int>
There are several ways to use autoplot()
to investigate the results. The default methods plots the metric(s) versus the parameters. Here is it for just the Brier score:
autoplot(sa_res, metric = "brier_class")
Next, we can see how the parameter values change over the search by adding type = "parameters"
:
autoplot(sa_res, metric = "brier_class", type = "parameters")
Finally, a plot of performance metrics can be used via type = "performance"
:
autoplot(sa_res, metric = "brier_class", type = "performance")
If we had used control_sim_anneal(save_worflow = TRUE)
, we could use fit_best()
to determine the candidate with the best metric value and then fit that model to the training set.
12.3 Genetic Algorithms
tidymodels has no API or function for optimizing models using genetic algorithms. However, there is unsupported code (below) for doing this as long as the tuning parameters are all numeric. We’ll use the GA package for the computations, and this will require:
- The upper and lower bounds of the parameters
- Code to transform the parameter values (if needed)
- A means to resample/evaluate a model on an out-of-sample data set.
- A method to compute a single performance metric such that larger values are more desirable.
To get started, let’s work with the parameter object named bst_param
. We can use purrr::map_dbl()
to get vectors of the minimum and maximum values. These should be in the transformed space (if needed):
min_vals <- map_dbl(bst_param$object, ~ .x$range[[1]])
max_vals <- map_dbl(bst_param$object, ~ .x$range[[2]])
The remainder of the tasks should occur within the GA’s processing. This function shows code with comments to help understand:
yardstick_fitness <- function(values, wflow, param_info, metrics, ...) {
# Quietly load required packages if run in parallel
shhh <- purrr::quietly(require)
loaded <- lapply(c("tidymodels", required_pkgs(wflow)), shhh)
info <- as_tibble(metrics)
# Check to see if there are any qualitative parameters and stop if so.
qual_check <- map_lgl(param_info$object, ~ inherits(.x, "qual_param"))
if (any(qual_check)) {
cli::cli_abort(
"The function only works for quantitative tuning parameters."
)
}
# Back-transform parameters if they use a transformation (inputs are in
# transformed scales)
values <- purrr::map2_dbl(
values,
param_info$object,
~ dials::value_inverse(.y, .x)
)
# Convert integer parameters to integers
is_int <- map_lgl(param_info$object, ~ .x$type == "integer")
int_param <- param_info$id[is_int]
for (i in int_param) {
ind <- which(param_info$id == i)
values[[ind]] <- floor(values[[ind]])
}
# Convert from vector to a tibble
values <- matrix(values, nrow = 1)
colnames(values) <- param_info$id
values <- as_tibble(values)
# We could run _populations_ within a generation in parallel. If we do,
# let's make sure to turn off parallelization of resamples here:
# ctrl <- control_grid(allow_par = FALSE)
ctrl <- control_grid()
# Resample / validate metrics
res <- tune_grid(
wflow,
metrics = metrics,
param_info = param_info,
grid = values,
control = ctrl,
...
)
# Fitness is to be maximized so change direction if needed
best_res <- show_best(res, metric = info$metric[1])
if (info$direction[1] == "minimize") {
obj_value <- -best_res$mean
} else {
obj_value <- best_res$mean
}
obj_value
}
Now, let’s initialize the search using a space-filling design (with 10 candidates per population):
pop_size <- 10
grid_ga <- grid_space_filling(bst_param, size = pop_size, original = FALSE)
# We apply the GA operators on the transformed scale of the parameters (if any).
# For this example, two use a log-transform:
grid_ga$learn_rate <- log10(grid_ga$learn_rate)
grid_ga$loss_reduction <- log10(grid_ga$loss_reduction)
Now we can run GA::ga()
to begin the process:
set.seed(158)
ga_res <-
ga(
# ga() options:
type = "real-valued",
fitness = yardstick_fitness,
lower = min_vals,
upper = max_vals,
popSize = pop_size,
suggestions = as.matrix(grid_ga),
maxiter = 25,
# Save the best solutions at each iteration
keepBest = TRUE,
seed = 39,
# Here we can signal to run _populations_ within a generation in parallel
parallel = FALSE,
# Now options to pass to the `...` in yardstick_fitness()
wflow = bst_wflow,
param_info = bst_param,
metrics = cls_mtr,
resamples = sim_rs
)
Here is a plot of the best results per population and the mean result (both are Brier scores):
# Negate the fitness value since the Brier score should be minimized.
-attr(ga_res,"summary") %>%
as_tibble() %>%
mutate(generation = row_number()) %>%
select(best = max, mean = mean, generation) %>%
pivot_longer(c(best, mean), names_to = "summary", values_to = "fitness") %>%
ggplot(aes(generation, fitness, col = summary, pch = summary)) +
geom_point() +
labs(x = "Generation", y = "Brier Score (CV)")
The best results are in a slot called solution
. Let’s remap that to the original parameter values:
ga_best <-
# There could be multiple solutions for the same fitness; we take the first.
ga_res@solution[1,] %>%
# Back-transform
map2(bst_param$object, ~ value_inverse(.y, .x)) %>%
as_tibble() %>%
set_names(bst_param$id) %>%
# Attach fitness and coerce to integer if needed.
mutate(
mtry = floor(mtry),
trees = floor(trees),
min_n = floor(min_n),
tree_depth = floor(tree_depth),
brier = -ga_res@fitnessValue
) %>%
relocate(brier)
ga_best
#> # A tibble: 1 × 8
#> brier mtry trees min_n tree_depth learn_rate loss_reduction sample_size
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.07793 9 410 2 6 0.01235 0.0000001197 0.6159
12.4 Bayesian Optimization
Numerous packages use Bayesian optimization:
and many others. The book Gaussian process modeling, design and optimization for the applied sciences also contains descriptions of many other GO packages.
Currently, tidymodels uses GPfit.
The tune package contains tune_bayes()
for Bayesian optimization. The syntax is identical to what we’ve already seen with tune_sim_anneal()
.
set.seed(221)
bo_res <- bst_wflow %>%
tune_bayes(
resamples = sim_rs,
param_info = bst_param,
metrics = cls_mtr,
# These options work as before:
initial = initial_res,
iter = 50,
control = control_bayes(
no_improve = Inf,
verbose_iter = TRUE,
save_pred = TRUE,
)
)
#> ! There are 7 tuning parameters and 6 grid points were requested.
#> • There are more tuning parameters than there are initial points. This is likely to
#> cause numerical issues in the first few search iterations.
#> Optimizing brier_class using the expected improvement
#>
#> ── Iteration 1 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1172 (@iter 0)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 7 features but only has 6 data points
#> to do so. This may cause errors or a poor model fit.
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=5, trees=489, min_n=26, tree_depth=14, learn_rate=0.0582,
#> loss_reduction=9.86e-09, sample_size=0.114
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.2499 (+/-0.000157)
#>
#> ── Iteration 2 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1172 (@iter 0)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 7 features but only has 7 data points
#> to do so. This may cause errors or a poor model fit.
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=96, min_n=39, tree_depth=14, learn_rate=0.0768,
#> loss_reduction=2.53e-07, sample_size=0.513
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.2189 (+/-0.00405)
#>
#> ── Iteration 3 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1172 (@iter 0)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 7 features but only has 8 data points
#> to do so. This may cause errors or a poor model fit.
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=5, trees=286, min_n=7, tree_depth=1, learn_rate=0.0278, loss_reduction=4.26e-05,
#> sample_size=0.432
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results: brier_class=0.1153 (+/-0.00444)
#>
#> ── Iteration 4 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=338, min_n=24, tree_depth=1, learn_rate=0.00178, loss_reduction=0.108,
#> sample_size=0.702
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.2166 (+/-0.00133)
#>
#> ── Iteration 5 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=384, min_n=28, tree_depth=11, learn_rate=0.254,
#> loss_reduction=0.000924, sample_size=0.936
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1163 (+/-0.00601)
#>
#> ── Iteration 6 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=208, min_n=30, tree_depth=15, learn_rate=0.0206, loss_reduction=3.24,
#> sample_size=0.81
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1245 (+/-0.00436)
#>
#> ── Iteration 7 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.1153 (@iter 3)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=229, min_n=6, tree_depth=15, learn_rate=0.266, loss_reduction=2.85,
#> sample_size=0.982
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results: brier_class=0.08313 (+/-0.00674)
#>
#> ── Iteration 8 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.08313 (@iter 7)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=296, min_n=12, tree_depth=15, learn_rate=0.176,
#> loss_reduction=9.48e-08, sample_size=0.999
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09177 (+/-0.00532)
#>
#> ── Iteration 9 ──────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.08313 (@iter 7)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=199, min_n=3, tree_depth=14, learn_rate=0.0622, loss_reduction=0.125,
#> sample_size=0.959
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results: brier_class=0.0824 (+/-0.00654)
#>
#> ── Iteration 10 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.0824 (@iter 9)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=48, min_n=6, tree_depth=15, learn_rate=0.0012, loss_reduction=9.11e-10,
#> sample_size=0.934
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.2413 (+/-0.000398)
#>
#> ── Iteration 11 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.0824 (@iter 9)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=255, min_n=4, tree_depth=8, learn_rate=0.164, loss_reduction=0.00354,
#> sample_size=0.352
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.0903 (+/-0.00684)
#>
#> ── Iteration 12 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.0824 (@iter 9)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=14, trees=205, min_n=3, tree_depth=14, learn_rate=0.0699,
#> loss_reduction=7.86e-05, sample_size=0.977
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results: brier_class=0.08224 (+/-0.00684)
#>
#> ── Iteration 13 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.08224 (@iter 12)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=8, trees=357, min_n=5, tree_depth=5, learn_rate=0.308, loss_reduction=4.44e-10,
#> sample_size=0.385
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1102 (+/-0.00737)
#>
#> ── Iteration 14 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.08224 (@iter 12)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=220, min_n=9, tree_depth=2, learn_rate=0.113, loss_reduction=5.26,
#> sample_size=0.405
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09564 (+/-0.00507)
#>
#> ── Iteration 15 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.08224 (@iter 12)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=249, min_n=4, tree_depth=8, learn_rate=0.134, loss_reduction=1.86,
#> sample_size=0.854
#> i Estimating performance
#> ✓ Estimating performance
#> ♥ Newest results: brier_class=0.07948 (+/-0.00598)
#>
#> ── Iteration 16 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=249, min_n=3, tree_depth=6, learn_rate=0.0989,
#> loss_reduction=2.55e-10, sample_size=0.896
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08295 (+/-0.00608)
#>
#> ── Iteration 17 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=244, min_n=4, tree_depth=6, learn_rate=0.0662, loss_reduction=13.2,
#> sample_size=0.976
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08762 (+/-0.00476)
#>
#> ── Iteration 18 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=238, min_n=3, tree_depth=1, learn_rate=0.166, loss_reduction=9.55e-07,
#> sample_size=0.966
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08199 (+/-0.00546)
#>
#> ── Iteration 19 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=6, trees=239, min_n=5, tree_depth=14, learn_rate=0.188, loss_reduction=3.6e-09,
#> sample_size=0.975
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08804 (+/-0.00716)
#>
#> ── Iteration 20 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=205, min_n=3, tree_depth=2, learn_rate=0.118, loss_reduction=0.37,
#> sample_size=0.845
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08471 (+/-0.00557)
#>
#> ── Iteration 21 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=326, min_n=2, tree_depth=14, learn_rate=0.107, loss_reduction=0.348,
#> sample_size=0.822
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08266 (+/-0.00683)
#>
#> ── Iteration 22 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=285, min_n=4, tree_depth=2, learn_rate=0.164, loss_reduction=7.6,
#> sample_size=0.966
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08542 (+/-0.0049)
#>
#> ── Iteration 23 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=251, min_n=3, tree_depth=2, learn_rate=0.0766, loss_reduction=7.89e-09,
#> sample_size=0.919
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08063 (+/-0.00555)
#>
#> ── Iteration 24 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=8, trees=213, min_n=2, tree_depth=14, learn_rate=0.128, loss_reduction=2.15e-10,
#> sample_size=0.831
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08654 (+/-0.00632)
#>
#> ── Iteration 25 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=357, min_n=4, tree_depth=8, learn_rate=0.177, loss_reduction=0.00166,
#> sample_size=0.901
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08988 (+/-0.00743)
#>
#> ── Iteration 26 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=203, min_n=2, tree_depth=14, learn_rate=0.183,
#> loss_reduction=3.94e-08, sample_size=0.922
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08778 (+/-0.00755)
#>
#> ── Iteration 27 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=342, min_n=4, tree_depth=2, learn_rate=0.0774, loss_reduction=1.98,
#> sample_size=0.748
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08631 (+/-0.00513)
#>
#> ── Iteration 28 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=11, trees=327, min_n=7, tree_depth=14, learn_rate=0.103,
#> loss_reduction=5.45e-10, sample_size=0.979
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08946 (+/-0.00573)
#>
#> ── Iteration 29 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=14, trees=266, min_n=2, tree_depth=6, learn_rate=0.136, loss_reduction=0.000997,
#> sample_size=0.853
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08579 (+/-0.00728)
#>
#> ── Iteration 30 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=6, trees=317, min_n=9, tree_depth=5, learn_rate=0.13, loss_reduction=1.5,
#> sample_size=0.914
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08558 (+/-0.00593)
#>
#> ── Iteration 31 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=356, min_n=21, tree_depth=7, learn_rate=0.145, loss_reduction=0.122,
#> sample_size=0.222
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.2315 (+/-0.0046)
#>
#> ── Iteration 32 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=3, trees=29, min_n=5, tree_depth=13, learn_rate=0.13, loss_reduction=18.1,
#> sample_size=0.672
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1201 (+/-0.00443)
#>
#> ── Iteration 33 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=4, trees=394, min_n=15, tree_depth=8, learn_rate=0.0998,
#> loss_reduction=5.21e-10, sample_size=0.804
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.0944 (+/-0.00564)
#>
#> ── Iteration 34 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=460, min_n=3, tree_depth=13, learn_rate=0.0819, loss_reduction=0.113,
#> sample_size=0.464
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08671 (+/-0.00593)
#>
#> ── Iteration 35 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=290, min_n=4, tree_depth=15, learn_rate=0.137, loss_reduction=14,
#> sample_size=0.542
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09203 (+/-0.00477)
#>
#> ── Iteration 36 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=14, trees=99, min_n=15, tree_depth=12, learn_rate=0.11, loss_reduction=2.86e-09,
#> sample_size=0.967
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08659 (+/-0.00509)
#>
#> ── Iteration 37 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=268, min_n=3, tree_depth=6, learn_rate=0.109, loss_reduction=0.297,
#> sample_size=0.437
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08782 (+/-0.00678)
#>
#> ── Iteration 38 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=34, min_n=2, tree_depth=1, learn_rate=0.143, loss_reduction=0.0201,
#> sample_size=0.995
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1141 (+/-0.00432)
#>
#> ── Iteration 39 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=225, min_n=13, tree_depth=11, learn_rate=0.0965,
#> loss_reduction=3.44e-06, sample_size=0.925
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08963 (+/-0.00498)
#>
#> ── Iteration 40 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=17, min_n=2, tree_depth=13, learn_rate=0.146, loss_reduction=0.026,
#> sample_size=0.359
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09176 (+/-0.00485)
#>
#> ── Iteration 41 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=13, trees=157, min_n=17, tree_depth=11, learn_rate=0.314,
#> loss_reduction=6.95e-08, sample_size=0.957
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09725 (+/-0.00662)
#>
#> ── Iteration 42 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=313, min_n=2, tree_depth=6, learn_rate=0.127, loss_reduction=0.109,
#> sample_size=0.813
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09222 (+/-0.00767)
#>
#> ── Iteration 43 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=12, trees=493, min_n=12, tree_depth=13, learn_rate=0.0622,
#> loss_reduction=0.000261, sample_size=0.637
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.094 (+/-0.00595)
#>
#> ── Iteration 44 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=18, min_n=32, tree_depth=13, learn_rate=0.089, loss_reduction=8.1e-09,
#> sample_size=0.995
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1396 (+/-0.00449)
#>
#> ── Iteration 45 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=7, trees=187, min_n=9, tree_depth=15, learn_rate=0.114, loss_reduction=0.00238,
#> sample_size=0.868
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08754 (+/-0.00559)
#>
#> ── Iteration 46 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=10, trees=11, min_n=3, tree_depth=4, learn_rate=0.197, loss_reduction=3.75e-10,
#> sample_size=0.137
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1147 (+/-0.00564)
#>
#> ── Iteration 47 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=9, trees=432, min_n=6, tree_depth=9, learn_rate=0.0675, loss_reduction=6.32e-09,
#> sample_size=0.521
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08992 (+/-0.00517)
#>
#> ── Iteration 48 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=443, min_n=4, tree_depth=15, learn_rate=0.0621, loss_reduction=0.682,
#> sample_size=0.838
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.08458 (+/-0.0066)
#>
#> ── Iteration 49 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=8, trees=482, min_n=9, tree_depth=2, learn_rate=0.0937, loss_reduction=0.00332,
#> sample_size=0.534
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.09804 (+/-0.00584)
#>
#> ── Iteration 50 ─────────────────────────────────────────────────────────────────────
#>
#> i Current best: brier_class=0.07948 (@iter 15)
#> i Gaussian process model
#> ✓ Gaussian process model
#> i Generating 5000 candidates
#> i Predicted candidates
#> i mtry=15, trees=71, min_n=21, tree_depth=3, learn_rate=0.064, loss_reduction=1.1e-10,
#> sample_size=0.858
#> i Estimating performance
#> ✓ Estimating performance
#> ⓧ Newest results: brier_class=0.1003 (+/-0.00459)
The same helper functions are used to interrogate the results and to create diagnostic plots:
show_best(bo_res, metric = "brier_class") %>%
select(-.estimator, -.config, -.metric) %>%
relocate(mean)
#> # A tibble: 5 × 11
#> mean mtry trees min_n tree_depth learn_rate loss_reduction sample_size n
#> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 0.07948 7 249 4 8 0.1343 1.859e+0 0.8543 10
#> 2 0.08063 7 251 3 2 0.07663 7.895e-9 0.9194 10
#> 3 0.08199 12 238 3 1 0.1658 9.551e-7 0.9661 10
#> 4 0.08224 14 205 3 14 0.06994 7.861e-5 0.9767 10
#> 5 0.08240 10 199 3 14 0.06220 1.247e-1 0.9593 10
#> # ℹ 2 more variables: std_err <dbl>, .iter <int>
These results are about the same as those of the SA search. We can plot the data and see that some parameters (number of trees, learning rate, minimum node size, and the sampling proportion) appear to converge to specific values:
autoplot(bo_res, metric = "brier_class")
Here we see that the learning rate and the minumum node size reach a steady-state:
autoplot(bo_res, metric = "brier_class", type = "parameters")
A plot of the overall progress:
autoplot(bo_res, metric = "brier_class", type = "performance")
Other packages use Bayesian optimization: