Skip to content

Dev#31

Merged
MrPhantom2325 merged 4 commits into
mainfrom
dev
May 13, 2026
Merged

Dev#31
MrPhantom2325 merged 4 commits into
mainfrom
dev

Conversation

@MrPhantom2325

Copy link
Copy Markdown
Owner

This pull request introduces full training-time action masking to the DQN agent, resulting in a new v5 experiment and configuration. The main goal is to ensure the agent only considers valid actions during both action selection and Bellman target computation, addressing issues with previous versions where invalid actions polluted learning. The changes also include new experiment configs and result metadata for both v4 (dense reward) and v5 (masked) variants.

Key changes:

DQN Agent: Action Masking Support

  • The ReplayBuffer now stores an optional next_mask (valid actions in the next state) with each transition. During sampling, missing masks are padded for backward compatibility. This enables masking in Bellman updates.
  • The agent's action selection now always respects the environment's action mask, both during exploration and exploitation, ensuring only valid actions are chosen.
  • During training, the Bellman target is computed by taking the max Q-value only over valid next-state actions, using the stored mask. This sharpens credit assignment and prevents learning from invalid actions.

Experiment Configurations and Results

  • Added dqn_v4_dense.yaml (v4: dense pickup reward, adjusted γ and ε, no masking) and dqn_v5_masked.yaml (v5: identical hyperparameters to v4 but with full action masking). [1] [2]
  • Added corresponding metadata files for both v4 and v5 runs, including hyperparameters and evaluation results. [1] [2] [3]

API Improvements

  • The MLflow policy loader now ensures model metadata reflects the actual registry version, improving result traceability.

@MrPhantom2325 MrPhantom2325 merged commit b6021d5 into main May 13, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant