Open
Description
For potential reproducibility of the observed issue:
- Running Random Search for 20 (
max_evaluations_total
) evaluations distributed across 4 workers - Midway through the run, killed a worker and restarted the worker soon enough
- The overall run ran fine but noticed certain anomalies, as described below,
- The process termination halted a config, for example, config ID
16
- On restarting, the 4 workers proceeded fine without errors but an extra config ID
21
was generated while config ID16
was not re-evaluated or completed and remainspending
forever
Some more observations:
- For
max_evaluations_total=20
we should have config IDs from 1-20 with each of them having their ownresult.yaml
- Only
config_16
does not haveresult.yaml
whereasconfig_21
does - If I now re-run a worker as
max_evaluations_total=21
, it now satisfies that extra config required by sampling a new configconfig_22
Should a new worker, re-evaluate pending configs, as priority?
Also with this issue or under this scenario the generated config IDs range from [1, n+1]
if max_evaluations_total=n
.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status