You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on Appendix A.9 in the paper, issue 381 with the missing zip-code validation error seems to be graded based on whether the model improves broader validation (e.g., country-specific regex dictionaries), as discussed in the issue's Github thread.
Does the model get to see the original GitHub discussion? If not, wouldn’t it be unfair to penalize the model for only fixing the issue as it was stated originally?
I might be misunderstanding what information the model is given in context: is it only: what's in issue_data.json, the state of the repo, and what it can glean from the user_tool?
This seems particularly important because AFAICT issue 381 is the IC-SWE issue with the highest payout.
The text was updated successfully, but these errors were encountered:
Based on Appendix A.9 in the paper, issue 381 with the missing zip-code validation error seems to be graded based on whether the model improves broader validation (e.g., country-specific regex dictionaries), as discussed in the issue's Github thread.
Does the model get to see the original GitHub discussion? If not, wouldn’t it be unfair to penalize the model for only fixing the issue as it was stated originally?
I might be misunderstanding what information the model is given in context: is it only: what's in
issue_data.json
, the state of the repo, and what it can glean from theuser_tool
?This seems particularly important because AFAICT issue 381 is the IC-SWE issue with the highest payout.
The text was updated successfully, but these errors were encountered: