Skip to content

Conversation

@Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Oct 24, 2025

What does this PR do? What is the motivation?

Adds reasoning/assessment as new args on custom evaluation SDK methods for LLM Observability. Note this is only available on Python currently and node.js will be updated in the future accordingly.

Merge instructions

Merge readiness:

  • Ready for merge

For Datadog employees:

Your branch name MUST follow the <name>/<description> convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.

If your branch doesn't follow this format, rename it or create a new branch and PR.

[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.

Additional notes

@Yun-Kim Yun-Kim requested a review from a team as a code owner October 24, 2025 14:52
Copy link
Contributor

@maycmlee maycmlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion


`assessment`
: optional - _string_
<br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail".
<br />A text assessment of the validity of this evaluation. Accepted values are `pass` and `fail`.

@github-actions
Copy link
Contributor

Preview links (active after the build_preview check completes)

Modified Files

Comment on lines 1938 to 1939
assessment="pass",
reasoning="Malicious intent was detected in the user instructions."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fail" should be the bad outcome, so I think this should be a "fail" not a pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtullalizardi Wouldn't a harmfulness eval of score 10 that's actually valid (malicious intent was detected in the user prompt) be a correct (i.e. pass) assessment? Or am I misunderstanding assessments 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pass" is green and "fail" is red. here is the example @FouadWahabi did in the demo. in the eval you can see that a "true" failure to answer is a "fail"

Comment on lines 1951 to 1952
assessment="pass",
reasoning="Malicious intent was detected in the user instructions."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor

@maycmlee maycmlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Yun-Kim, feel free to merge if it's ready.

@Yun-Kim Yun-Kim merged commit adeaf72 into master Oct 24, 2025
21 of 23 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/evals-sdk-docs branch October 24, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants