Hi, thank you for open-sourcing this great work. I have a quick clarification about the GRPO-SIF setup. In the current prompt, the model is asked to output interleaved and , but it does not seem to explicitly require a "depth" field inside each JSON item. Meanwhile, the depth_consistency reward appears to depend on parsing that depth value. Could this mismatch be the reason why depth reward is often 0? I would really appreciate your guidance on whether this is expected or if the prompt should explicitly enforce depth output.
Hi, thank you for open-sourcing this great work. I have a quick clarification about the GRPO-SIF setup. In the current prompt, the model is asked to output interleaved and , but it does not seem to explicitly require a "depth" field inside each JSON item. Meanwhile, the depth_consistency reward appears to depend on parsing that depth value. Could this mismatch be the reason why depth reward is often 0? I would really appreciate your guidance on whether this is expected or if the prompt should explicitly enforce depth output.