Skip to contents

run_mod_lm() runs a suite of statistical models, returning a final model fit.

Usage

run_mod_lm(
  tbl,
  preproc,
  folds,
  metrics,
  rank_metric,
  cross = TRUE,
  seed = 2023
)

Arguments

tbl

Input data frame containing the data to model.

preproc

A list of pre-processing steps.

folds

An integer. The number of cross-validation folds.

metrics

A tibble containing the performance metrics to evaluate.

rank_metric

A metric from metrics to rank results by.

cross

A logical: should all combinations of the pre-processors and models be used to create the workflows? If FALSE, the length of preproc and models should be equal.

seed

A single integer.

Value

A list containing a workset, ranked model results, and a final model fit.

Examples

if (FALSE) {
tbl <-
  build_tbl(
    "tb",
    estimated = "who_estimates.e_inc_num",
    notified = "who_notifications.c_newinc",
    year = 2019,
    vars = extract_vars("tb")
  ) |>
  dplyr::mutate(is_hbc = forcats::as_factor(is_hbc)) |>
  dplyr::select(-any_of(c("year")))

preproc_list <- get_mod_preproc(
  .tbl = tbl,
  .neighbors = 5,
  .threshold = 0.25,
  .impute_with = c("gdp", "e_inc_num", "pop_total")
 )

run_mod_lm(
  tbl,
  preproc = preproc_list,
  folds = 10,
  metrics = yardstick::metric_set(yardstick::rmse, yardstick::rsq),
  rank_metric = "rmse"
)
}