Sorting of actions might be suboptimal #30

FeignDeath · 2025-02-05T18:19:30Z

Greetings,

as of now, an action is required for every timestep and every train, due to the nature of the sorted list you use. For malfunctions i needed to take a look at the state machine and found something interesting.

Passing a wait action while a train is ready to depart, does not pose a problem, as there is no edge from ready to depart when the stop action is received. Yet if a malfunction occurs while of map, the malfunction off map state is reached and from there a stop action leads to stopped, thus spawning the train.

The point is, that the only way from malfunction off map to get back to ready is via giving no action. That isn't possible in your current design, thus forcing a train with a off-map-malfunction to spawn directly after.

This is easily enforced and does not pose a problem, but I wanted to mention it.

FeignDeath · 2025-02-27T14:35:15Z

I just wanted to not some additional implications of the forced spawn problem and provide a simple solution.

The above described behaviour, has proven way more difficult to handle as it creates many edges cases, requiring special care. Some examples: A malfunction directly following up the previous one, negates the forced spawn. Another train might (due to another malfunction) might occupy the cell and negates the forced spawn. Two trains might get the same forced spawn time and cell.

The last one is the biggest problem, as I found no reliable way to figure out which train goes first, to write exception rules for the other one, so it's movement is negated.

Instead of trying to model these complicated edge cases, I solved it in your modules, by implementing the DO_NOTHING action, which enables a transition back to ready to depart instead of forced spawning. This might be a general solution, as it contains the structure of one action given at every timestep.

FeignDeath · 2025-02-27T16:27:49Z

Update. It seems, one needs to forcibly reset the saved action, to supress this forced spawning behaviour.

It can be done via:

for a in actions[timestep]:
    env.agents[a].action_saver.saved_action = None

after every malfunction. Otherwise it saves the action, occuring before the malfunction and uses that one even if nothing is given. It might be sensible to keep this behaviour, as it aligns well with real world examples, but since I didn't know it was there up to now I thought I would note it.

helloKep · 2025-02-28T11:40:25Z

Thanks for detailing all of this. I agree this definitely needs to be addressed. Since this is my first time looking at it, I'll share that I'm a bit cautious of introducing the DO_NOTHING action, but if this is the only place it appears, that might be manageable. Thank you for the code suggestion. I'll spend a bit of time looking through this.

helloKep · 2025-02-28T11:57:21Z

@FeignDeath In the interest of maintaining as much of the current design as possible, does it work to replace the wait action with DO_NOTHING for each time step before a train has spawned? Or is that treating this issue too broadly?

FeignDeath · 2025-03-03T23:54:29Z

I will test a little bit and then write again, but to give my current assumption:

The necessity of DO_NOTHING was something I assumed, by looking at the state machine image. The necessity to remove the saved_action, was something I deduced, by digging through the python code of Flatland.

For that reason I am certain, that the setting of saved_action to None is necessary. But I just tested a 100 instances. And they only passed completely if I utilised DO_NOTHING as well.

To add more context. I added the line to purge the saved action to the malfunction handling. (I originally thought I would have to add it to only the malfunctioning off-map trains, but it also works to apply it indiscriminately.)

if len(new_malfs) > 0:
    context = sim.provide_context(actions, timestep, mal.get())
    actions, s = sim.update_actions(context)
    secondary_stats.append(s.copy())
    for a in actions[timestep]:
        env.agents[a].action_saver.saved_action = None

I tried to implement your suggestion of replacing every STOP_MOVING with a DO_NOTHING while a train is off map and it seems to work fine.
To give you a working example I added the following to solve.py:

current = actions[timestep]
for a in current:
    if not env.agents[a].position and current[a] == RailEnvActions.STOP_MOVING:
        current[a] = RailEnvActions.DO_NOTHING
    _, _, done, info = env.step(current)

But I must admit, that my testing is limited to my encodings. It seemed to work well, with the exception of one edge case(which I did not have time to look at, since I have to write the report), but that does not mean it has to hold overall. Might still be, that I did something weird, but it's a tendency.

helloKep self-assigned this Feb 28, 2025

helloKep added the bug Something isn't working label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sorting of actions might be suboptimal #30

Sorting of actions might be suboptimal #30

FeignDeath commented Feb 5, 2025

FeignDeath commented Feb 27, 2025

FeignDeath commented Feb 27, 2025

helloKep commented Feb 28, 2025

helloKep commented Feb 28, 2025

FeignDeath commented Mar 3, 2025

Sorting of actions might be suboptimal #30

Sorting of actions might be suboptimal #30

Comments

FeignDeath commented Feb 5, 2025

FeignDeath commented Feb 27, 2025

FeignDeath commented Feb 27, 2025

helloKep commented Feb 28, 2025

helloKep commented Feb 28, 2025

FeignDeath commented Mar 3, 2025