Skip to content

Refactor trans_models_t to use mlr3: replace fit_fun/gof_fun with Learner/AutoTuner interface#24

Draft
Copilot wants to merge 15 commits intomainfrom
copilot/integrate-mlr3-library
Draft

Refactor trans_models_t to use mlr3: replace fit_fun/gof_fun with Learner/AutoTuner interface#24
Copilot wants to merge 15 commits intomainfrom
copilot/integrate-mlr3-library

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 16, 2026

Replaces the ad-hoc fit_fun/gof_fun function-passing interface with first-class mlr3 integration. Learner identity, hyperparameters, and a serialized untrained spec are stored natively in DuckDB; cross-validation uses mlr3 tasks and measures throughout.

Schema (trans_models_t)

Old New Type Notes
model_family learner_id VARCHAR mlr3 twoclass LearnerClassif key
model_params learner_params MAP(VARCHAR,VARCHAR) Atomic scalar params only
fit_call learner_spec BLOB Serialized untrained Learner
goodness_of_fit crossval_score MAP(VARCHAR,DOUBLE) prediction$score(measures)
model_obj_part crossval_predictions BLOB Serialized PredictionClassif
model_obj_full learner_full BLOB Serialized trained Learner

Primary key: (id_run, id_trans, fit_call)(id_run, id_trans, learner_id)

API

# New signatures — no backward compatibility
fit_partial_models(self, learner, measures, sample_frac = 0.7, seed = NULL, cluster = NULL)

# fit_full_models supports two mutually exclusive modes:
# 1. Score-select mode: pick best partial model per transition by score, retrain on full data
fit_full_models(self, select_score, select_maximize = TRUE, cluster = NULL)

# 2. Direct-learner mode: train a fresh learner clone directly on full data (no partial models needed)
fit_full_models(self, learner, cluster = NULL)

# measures accepts a character vector of mlr3 measure IDs (convenience) or a list of Measure objects
db$trans_models_t <- db$fit_partial_models(
  learner  = mlr3::lrn("classif.ranger", num.trees = 500, predict_type = "prob"),
  measures = c("classif.auc"),   # or list(mlr3::msr("classif.auc"))
  seed     = 42
)

# Score-select mode
db$trans_models_t <- db$fit_full_models(
  select_score    = "classif.auc",
  select_maximize = TRUE
)

# Direct-learner mode (crossval_score / crossval_predictions will be NULL)
db$trans_models_t <- db$fit_full_models(
  learner = mlr3::lrn("classif.ranger", predict_type = "prob")
)

Worker logic

  • fit_partial_model_worker: Builds as_task_classif(..., positive = "TRUE"), deep-clones and trains the learner, scores held-out split via prediction$score(measures). For AutoTuner, extracts the optimal inner learner for learner_id/learner_params/learner_spec.
  • fit_full_model_worker: Operates in two modes. In score-select mode, reconstructs from learner_spec BLOB (falls back to do.call(mlr3::lrn, ...) on deserialization failure, logging the reconstructed learner via fallback$format()), and passes crossval_score/crossval_predictions through from the partial model. In direct-learner mode, clones and trains the passed learner; crossval_score/crossval_predictions are NULL. A single fetch() retrieves all MAP/BLOB columns for the best rows, eliminating a redundant second fetch.
  • predict_trans_pot: Deserializes learner_full and calls learner$predict_newdata(pred_data_post)$prob[, "TRUE"] directly — no intermediate copy needed, as mlr3 drops non-feature columns automatically.

New method

get_crossval_plots(id_run, id_trans) deserializes all crossval_predictions BLOBs and returns mlr3viz::autoplot() results for visual GoF inspection. Integration tests added (guarded by requireNamespace("mlr3viz")).

Dependencies

mlr3 moved to hard Imports; mlr3viz remains in Suggests.

Comment thread R/trans_pot_t.R Outdated
Comment thread DESCRIPTION Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Copilot AI and others added 2 commits April 17, 2026 21:33
Copilot AI requested a review from mmyrte April 17, 2026 21:35
mmyrte and others added 3 commits April 18, 2026 09:38
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread R/trans_models_t.R Outdated
Comment thread inst/tinytest/test_integ_trans_models_t.R Outdated
Comment thread R/evoland_db.R
Comment thread inst/tinytest/test_integ_allocation.R Outdated
Comment thread R/trans_models_t.R Outdated
…mplified fetch; fix docs/guards/tests

Agent-Logs-Url: https://github.com/ethzplus/evoland-plus/sessions/44f0fa02-3b76-45d5-a99b-1d20977bcc9e

Co-authored-by: mmyrte <24587121+mmyrte@users.noreply.github.com>
Copilot AI requested a review from mmyrte April 21, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants