-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[MLOB-4280] Update SDK docs with new custom eval args #32378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion
|
|
||
| `assessment` | ||
| : optional - _string_ | ||
| <br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail". | |
| <br />A text assessment of the validity of this evaluation. Accepted values are `pass` and `fail`. |
Preview links (active after the
|
| assessment="pass", | ||
| reasoning="Malicious intent was detected in the user instructions." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fail" should be the bad outcome, so I think this should be a "fail" not a pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtullalizardi Wouldn't a harmfulness eval of score 10 that's actually valid (malicious intent was detected in the user prompt) be a correct (i.e. pass) assessment? Or am I misunderstanding assessments 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pass" is green and "fail" is red. here is the example @FouadWahabi did in the demo. in the eval you can see that a "true" failure to answer is a "fail"
| assessment="pass", | ||
| reasoning="Malicious intent was detected in the user instructions." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Yun-Kim, feel free to merge if it's ready.
What does this PR do? What is the motivation?
Adds
reasoning/assessmentas new args on custom evaluation SDK methods for LLM Observability. Note this is only available on Python currently and node.js will be updated in the future accordingly.Merge instructions
Merge readiness:
For Datadog employees:
Your branch name MUST follow the
<name>/<description>convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.If your branch doesn't follow this format, rename it or create a new branch and PR.
[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.
Additional notes