Skip to content

Conversation

@aaronvg
Copy link
Contributor

@aaronvg aaronvg commented Sep 13, 2025

Pull Request Template

Thanks for taking the time to fill out this pull request!

Issue Reference

Please link to any related issues

Changes

Please describe the changes proposed in this pull request

This PR addresses an issue where selected_call could incorrectly mark a failed LLM response as selected. The logic for determining the selected call in engine/baml-runtime/src/tracingv2/storage/storage.rs now explicitly checks that the LLM response does not contain an error_message. A new unit test has been added to ensure that if multiple calls occur, a successful call is correctly prioritized as the selected_call over a failed one.

Testing

Please describe how you tested these changes

  • Unit tests added/updated (See test_selected_call_prefers_success_over_failure)
  • Manual testing performed
  • Tested in [environment]

Screenshots

If applicable, add screenshots to help explain your changes

PR Checklist

Please ensure you've completed these items

  • I have read and followed the contributing guidelines
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

Additional Notes

Add any other context about the PR here

Tests were not executed in the environment due to missing cargo.
To run tests locally:

cargo test -p baml-runtime -q

Slack Thread

Open in Cursor Open in Web

@cursor
Copy link

cursor bot commented Sep 13, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@aaronvg aaronvg temporarily deployed to boundary-tools-dev September 13, 2025 23:43 — with GitHub Actions Inactive
@aaronvg aaronvg temporarily deployed to boundary-tools-dev September 13, 2025 23:43 — with GitHub Actions Inactive
@aaronvg aaronvg temporarily deployed to boundary-tools-dev September 13, 2025 23:43 — with GitHub Actions Inactive
@vercel
Copy link

vercel bot commented Sep 13, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
promptfiddle Skipped Skipped Sep 13, 2025 11:52pm

@github-actions
Copy link

@github-actions
Copy link

@aaronvg
Copy link
Contributor Author

aaronvg commented Sep 13, 2025

@cursoragent you should choose the last call in the stack using the timestamp, unless it failed then try to look for the one that passed. The http_request_id is ULID with a timestamp. see define_id!(..)

@cursor
Copy link

cursor bot commented Sep 13, 2025

I refined the LLM call selection logic in build_function_log to ensure only one call is marked 'selected' per function log, prioritizing the latest successful call for improved trace clarity.

  • I modified build_function_log in engine/baml-runtime/src/tracingv2/storage/storage.rs. I now collect all LLM call candidates, then apply a new algorithm: select the latest successful call; if the latest failed, select the most recent successful call.
  • I added test_selected_call_chooses_earlier_success_if_last_failed to validate this new selection behavior.

Open in Cursor Open in Web

Learn more about Cursor Agents

@cursor cursor bot temporarily deployed to boundary-tools-dev September 13, 2025 23:52 Inactive
@cursor cursor bot temporarily deployed to boundary-tools-dev September 13, 2025 23:52 Inactive
@cursor cursor bot temporarily deployed to boundary-tools-dev September 13, 2025 23:52 Inactive
@github-actions
Copy link

@aaronvg aaronvg marked this pull request as ready for review September 13, 2025 23:56
@github-actions
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Collector: when using fallback strategy with 2 clients , and if one of the call is failed, the selected_call will be populated with failed call.

2 participants