Run a suite of statistical models — run_mod

run_mod_lm() runs a suite of statistical models, returning a final model fit.

Usage

run_mod_lm(
  tbl,
  preproc,
  folds,
  metrics,
  rank_metric,
  cross = TRUE,
  seed = 2023
)

Arguments

tbl: Input data frame containing the data to model.
preproc: A list of pre-processing steps.
folds: An integer. The number of cross-validation folds.
metrics: A tibble containing the performance metrics to evaluate.
rank_metric: A metric from metrics to rank results by.
cross: A logical: should all combinations of the pre-processors and models be used to create the workflows? If FALSE, the length of preproc and models should be equal.
seed: A single integer.

Value

A list containing a workset, ranked model results, and a final model fit.

Examples

if (FALSE) {
tbl <-
  build_tbl(
    "tb",
    estimated = "who_estimates.e_inc_num",
    notified = "who_notifications.c_newinc",
    year = 2019,
    vars = extract_vars("tb")
  ) |>
  dplyr::mutate(is_hbc = forcats::as_factor(is_hbc)) |>
  dplyr::select(-any_of(c("year")))

preproc_list <- get_mod_preproc(
  .tbl = tbl,
  .neighbors = 5,
  .threshold = 0.25,
  .impute_with = c("gdp", "e_inc_num", "pop_total")
 )

run_mod_lm(
  tbl,
  preproc = preproc_list,
  folds = 10,
  metrics = yardstick::metric_set(yardstick::rmse, yardstick::rsq),
  rank_metric = "rmse"
)
}