Dev#31
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces full training-time action masking to the DQN agent, resulting in a new v5 experiment and configuration. The main goal is to ensure the agent only considers valid actions during both action selection and Bellman target computation, addressing issues with previous versions where invalid actions polluted learning. The changes also include new experiment configs and result metadata for both v4 (dense reward) and v5 (masked) variants.
Key changes:
DQN Agent: Action Masking Support
ReplayBuffernow stores an optionalnext_mask(valid actions in the next state) with each transition. During sampling, missing masks are padded for backward compatibility. This enables masking in Bellman updates.Experiment Configurations and Results
dqn_v4_dense.yaml(v4: dense pickup reward, adjusted γ and ε, no masking) anddqn_v5_masked.yaml(v5: identical hyperparameters to v4 but with full action masking). [1] [2]API Improvements