Skip to content

Conversation

vignesh14052002
Copy link

response_reference need not to wait for reference_response, so made it concurrent, it reduces the overall evaluation time by half

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Aug 4, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR optimizes the FactualCorrectness metric by implementing concurrent execution of independent operations, reducing evaluation time by approximately 50%. The factual correctness metric works by comparing claims between a response and reference in both directions: decomposing the response into claims and verifying them against the reference (reference_response), and decomposing the reference into claims and verifying them against the response (response_reference).

The key insight is that these two operations are completely independent - they don't need to wait for each other's results. Previously, they were executed sequentially using separate await calls, which meant the total execution time was the sum of both operations. The optimization uses asyncio.gather() to run these tasks concurrently, reducing the total time to the maximum of either operation rather than their sum.

The implementation adds a new helper method decompose_and_verify_claims() that encapsulates the two-step process of claim decomposition and verification. For cases where only precision mode is used, a lightweight get_empty_response() helper is provided since the response_reference evaluation isn't needed. This change integrates seamlessly with the existing factual correctness evaluation pipeline and maintains all the existing functionality while significantly improving performance for the common case where both directions of evaluation are required.

Confidence score: 4/5

  • This PR is safe to merge with minimal risk as it only changes execution order without altering logic
  • Score reflects a solid performance optimization with clean implementation and no breaking changes
  • Pay close attention to the asyncio concurrency patterns and ensure proper error handling in concurrent tasks

1 file reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@anistark anistark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @vignesh14052002

Some thoughts have left for changes. Please also rebase to latest main.

response_claims = await self.decompose_claims(response, callbacks)
reference_response = await self.verify_claims(
premise=reference, hypothesis_list=response_claims, callbacks=callbacks
async def get_empty_response():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make it a class method for reusability

@@ -287,5 +293,13 @@ async def _single_turn_ascore(

return np.round(score, 2)

async def decompose_and_verify_claims(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this something more generic for clarity.

How about get_claim_verification_results ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants