Skip to content

Check restarting/handling of pending config when resuming a run #30

Open
@Neeratyoy

Description

@Neeratyoy

For potential reproducibility of the observed issue:

  • Running Random Search for 20 (max_evaluations_total) evaluations distributed across 4 workers
  • Midway through the run, killed a worker and restarted the worker soon enough
  • The overall run ran fine but noticed certain anomalies, as described below,
  1. The process termination halted a config, for example, config ID 16
  2. On restarting, the 4 workers proceeded fine without errors but an extra config ID 21 was generated while config ID 16 was not re-evaluated or completed and remains pending forever

Some more observations:

  • For max_evaluations_total=20 we should have config IDs from 1-20 with each of them having their own result.yaml
  • Only config_16 does not have result.yaml whereas config_21 does
  • If I now re-run a worker as max_evaluations_total=21, it now satisfies that extra config required by sampling a new config config_22

Should a new worker, re-evaluate pending configs, as priority?
Also with this issue or under this scenario the generated config IDs range from [1, n+1] if max_evaluations_total=n.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions