Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting of actions might be suboptimal #30

Open
FeignDeath opened this issue Feb 5, 2025 · 5 comments
Open

Sorting of actions might be suboptimal #30

FeignDeath opened this issue Feb 5, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@FeignDeath
Copy link
Contributor

Greetings,

as of now, an action is required for every timestep and every train, due to the nature of the sorted list you use. For malfunctions i needed to take a look at the state machine and found something interesting.

Image

Passing a wait action while a train is ready to depart, does not pose a problem, as there is no edge from ready to depart when the stop action is received. Yet if a malfunction occurs while of map, the malfunction off map state is reached and from there a stop action leads to stopped, thus spawning the train.

The point is, that the only way from malfunction off map to get back to ready is via giving no action. That isn't possible in your current design, thus forcing a train with a off-map-malfunction to spawn directly after.

This is easily enforced and does not pose a problem, but I wanted to mention it.

@FeignDeath
Copy link
Contributor Author

I just wanted to not some additional implications of the forced spawn problem and provide a simple solution.

The above described behaviour, has proven way more difficult to handle as it creates many edges cases, requiring special care. Some examples: A malfunction directly following up the previous one, negates the forced spawn. Another train might (due to another malfunction) might occupy the cell and negates the forced spawn. Two trains might get the same forced spawn time and cell.

The last one is the biggest problem, as I found no reliable way to figure out which train goes first, to write exception rules for the other one, so it's movement is negated.

Instead of trying to model these complicated edge cases, I solved it in your modules, by implementing the DO_NOTHING action, which enables a transition back to ready to depart instead of forced spawning. This might be a general solution, as it contains the structure of one action given at every timestep.

@FeignDeath
Copy link
Contributor Author

Update. It seems, one needs to forcibly reset the saved action, to supress this forced spawning behaviour.

It can be done via:

for a in actions[timestep]:
    env.agents[a].action_saver.saved_action = None

after every malfunction. Otherwise it saves the action, occuring before the malfunction and uses that one even if nothing is given. It might be sensible to keep this behaviour, as it aligns well with real world examples, but since I didn't know it was there up to now I thought I would note it.

@helloKep
Copy link
Collaborator

Thanks for detailing all of this. I agree this definitely needs to be addressed. Since this is my first time looking at it, I'll share that I'm a bit cautious of introducing the DO_NOTHING action, but if this is the only place it appears, that might be manageable. Thank you for the code suggestion. I'll spend a bit of time looking through this.

@helloKep
Copy link
Collaborator

@FeignDeath In the interest of maintaining as much of the current design as possible, does it work to replace the wait action with DO_NOTHING for each time step before a train has spawned? Or is that treating this issue too broadly?

@helloKep helloKep self-assigned this Feb 28, 2025
@helloKep helloKep added the bug Something isn't working label Feb 28, 2025
@FeignDeath
Copy link
Contributor Author

I will test a little bit and then write again, but to give my current assumption:

The necessity of DO_NOTHING was something I assumed, by looking at the state machine image. The necessity to remove the saved_action, was something I deduced, by digging through the python code of Flatland.

For that reason I am certain, that the setting of saved_action to None is necessary. But I just tested a 100 instances. And they only passed completely if I utilised DO_NOTHING as well.

To add more context. I added the line to purge the saved action to the malfunction handling. (I originally thought I would have to add it to only the malfunctioning off-map trains, but it also works to apply it indiscriminately.)

if len(new_malfs) > 0:
    context = sim.provide_context(actions, timestep, mal.get())
    actions, s = sim.update_actions(context)
    secondary_stats.append(s.copy())
    for a in actions[timestep]:
        env.agents[a].action_saver.saved_action = None

I tried to implement your suggestion of replacing every STOP_MOVING with a DO_NOTHING while a train is off map and it seems to work fine.
To give you a working example I added the following to solve.py:

current = actions[timestep]
for a in current:
    if not env.agents[a].position and current[a] == RailEnvActions.STOP_MOVING:
        current[a] = RailEnvActions.DO_NOTHING
    _, _, done, info = env.step(current)

But I must admit, that my testing is limited to my encodings. It seemed to work well, with the exception of one edge case(which I did not have time to look at, since I have to write the report), but that does not mean it has to hold overall. Might still be, that I did something weird, but it's a tendency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants