[MLOB-4280] Update SDK docs with new custom eval args #32378

Yun-Kim · 2025-10-24T14:52:31Z

What does this PR do? What is the motivation?

Adds reasoning/assessment as new args on custom evaluation SDK methods for LLM Observability. Note this is only available on Python currently and node.js will be updated in the future accordingly.

Merge instructions

Merge readiness:

Ready for merge

For Datadog employees:

Your branch name MUST follow the <name>/<description> convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.

If your branch doesn't follow this format, rename it or create a new branch and PR.

[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.

Additional notes

maycmlee

Small suggestion

maycmlee · 2025-10-24T14:55:55Z

content/en/llm_observability/instrumentation/sdk.md

+
+`assessment`
+: optional - _string_
+<br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail".


Suggested change

<br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail".

<br />A text assessment of the validity of this evaluation. Accepted values are `pass` and `fail`.

github-actions · 2025-10-24T14:56:39Z

Preview links (active after the `build_preview` check completes)

Modified Files

https://docs-staging.datadoghq.com/yunkim/evals-sdk-docs/llm_observability/instrumentation/sdk

mtullalizardi · 2025-10-24T14:56:56Z

content/en/llm_observability/instrumentation/sdk.md

+        assessment="pass",
+        reasoning="Malicious intent was detected in the user instructions."


"fail" should be the bad outcome, so I think this should be a "fail" not a pass

@mtullalizardi Wouldn't a harmfulness eval of score 10 that's actually valid (malicious intent was detected in the user prompt) be a correct (i.e. pass) assessment? Or am I misunderstanding assessments 😭

"pass" is green and "fail" is red. here is the example @FouadWahabi did in the demo. in the eval you can see that a "true" failure to answer is a "fail"

mtullalizardi · 2025-10-24T14:57:07Z

content/en/llm_observability/instrumentation/sdk.md

+        assessment="pass",
+        reasoning="Malicious intent was detected in the user instructions."


same as above

maycmlee

Thanks @Yun-Kim, feel free to merge if it's ready.

Update new custom eval args

762f2bb

Yun-Kim requested a review from a team as a code owner October 24, 2025 14:52

maycmlee reviewed Oct 24, 2025

View reviewed changes

mtullalizardi reviewed Oct 24, 2025

View reviewed changes

Yun-Kim added 2 commits October 24, 2025 11:11

fmt code

38d00bb

Change description of assessment

1e4a89c

mtullalizardi approved these changes Oct 24, 2025

View reviewed changes

maycmlee approved these changes Oct 24, 2025

View reviewed changes

Yun-Kim merged commit adeaf72 into master Oct 24, 2025
21 of 23 checks passed

Yun-Kim deleted the yunkim/evals-sdk-docs branch October 24, 2025 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLOB-4280] Update SDK docs with new custom eval args #32378

[MLOB-4280] Update SDK docs with new custom eval args #32378

Uh oh!

Yun-Kim commented Oct 24, 2025 •

edited

Loading

Uh oh!

maycmlee left a comment

Uh oh!

maycmlee Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

mtullalizardi Oct 24, 2025

Uh oh!

Yun-Kim Oct 24, 2025

Uh oh!

mtullalizardi Oct 24, 2025

Uh oh!

mtullalizardi Oct 24, 2025

Uh oh!

maycmlee left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	<br />A text assessment of the validity of this evaluation. Accepted values are "pass" and "fail".
	<br />A text assessment of the validity of this evaluation. Accepted values are `pass` and `fail`.

		assessment="pass",
		reasoning="Malicious intent was detected in the user instructions."

[MLOB-4280] Update SDK docs with new custom eval args #32378

[MLOB-4280] Update SDK docs with new custom eval args #32378

Uh oh!

Conversation

Yun-Kim commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do? What is the motivation?

Merge instructions

Additional notes

Uh oh!

maycmlee left a comment

Choose a reason for hiding this comment

Uh oh!

maycmlee Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 24, 2025

Preview links (active after the build_preview check completes)

Modified Files

Uh oh!

mtullalizardi Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Yun-Kim Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

mtullalizardi Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

mtullalizardi Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

maycmlee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yun-Kim commented Oct 24, 2025 •

edited

Loading

Preview links (active after the `build_preview` check completes)