Question about depth reward

Hi, thank you for open-sourcing this great work. I have a quick clarification about the GRPO-SIF setup. In the current prompt, the model is asked to output interleaved <area> and <text>, but it does not seem to explicitly require a "depth" field inside each <area> JSON item. Meanwhile, the depth_consistency reward appears to depend on parsing that depth value. Could this mismatch be the reason why depth reward is often 0? I would really appreciate your guidance on whether this is expected or if the prompt should explicitly enforce depth output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about depth reward #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about depth reward #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions