Dev by MrPhantom2325 · Pull Request #31 · MrPhantom2325/HungerNet-Smart-Food-Redistribution-Using-AI

MrPhantom2325 · 2026-05-13T19:36:48Z

This pull request introduces full training-time action masking to the DQN agent, resulting in a new v5 experiment and configuration. The main goal is to ensure the agent only considers valid actions during both action selection and Bellman target computation, addressing issues with previous versions where invalid actions polluted learning. The changes also include new experiment configs and result metadata for both v4 (dense reward) and v5 (masked) variants.

Key changes:

DQN Agent: Action Masking Support

The ReplayBuffer now stores an optional next_mask (valid actions in the next state) with each transition. During sampling, missing masks are padded for backward compatibility. This enables masking in Bellman updates.
The agent's action selection now always respects the environment's action mask, both during exploration and exploitation, ensuring only valid actions are chosen.
During training, the Bellman target is computed by taking the max Q-value only over valid next-state actions, using the stored mask. This sharpens credit assignment and prevents learning from invalid actions.

Experiment Configurations and Results

Added dqn_v4_dense.yaml (v4: dense pickup reward, adjusted γ and ε, no masking) and dqn_v5_masked.yaml (v5: identical hyperparameters to v4 but with full action masking). [1] [2]
Added corresponding metadata files for both v4 and v5 runs, including hyperparameters and evaluation results. [1] [2] [3]

API Improvements

The MLflow policy loader now ensures model metadata reflects the actual registry version, improving result traceability.

MrPhantom2325 added 4 commits May 13, 2026 23:23

made changes to improve DQN

c33ddcb

made changes to dqn model

7e546ce

resolved issues

29cd146

resolved cicd issue

93b1ec8

MrPhantom2325 merged commit b6021d5 into main May 13, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#31

Dev#31
MrPhantom2325 merged 4 commits into
mainfrom
dev

MrPhantom2325 commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MrPhantom2325 commented May 13, 2026

DQN Agent: Action Masking Support

Experiment Configurations and Results

API Improvements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant