Optimize factual correctness metric runtime by 50% #2153

vignesh14052002 · 2025-08-04T14:28:57Z

response_reference need not to wait for reference_response, so made it concurrent, it reduces the overall evaluation time by half

greptile-apps

Greptile Summary

This PR optimizes the FactualCorrectness metric by implementing concurrent execution of independent operations, reducing evaluation time by approximately 50%. The factual correctness metric works by comparing claims between a response and reference in both directions: decomposing the response into claims and verifying them against the reference (reference_response), and decomposing the reference into claims and verifying them against the response (response_reference).

The key insight is that these two operations are completely independent - they don't need to wait for each other's results. Previously, they were executed sequentially using separate await calls, which meant the total execution time was the sum of both operations. The optimization uses asyncio.gather() to run these tasks concurrently, reducing the total time to the maximum of either operation rather than their sum.

The implementation adds a new helper method decompose_and_verify_claims() that encapsulates the two-step process of claim decomposition and verification. For cases where only precision mode is used, a lightweight get_empty_response() helper is provided since the response_reference evaluation isn't needed. This change integrates seamlessly with the existing factual correctness evaluation pipeline and maintains all the existing functionality while significantly improving performance for the common case where both directions of evaluation are required.

Confidence score: 4/5

This PR is safe to merge with minimal risk as it only changes execution order without altering logic
Score reflects a solid performance optimization with clean implementation and no breaking changes
Pay close attention to the asyncio concurrency patterns and ensure proper error handling in concurrent tasks

_{1 file reviewed, 1 comment}

_{Edit Code Review Bot Settings | Greptile}

ragas/src/ragas/metrics/_factual_correctness.py

anistark

Thanks for the PR @vignesh14052002

Some thoughts have left for changes. Please also rebase to latest main.

anistark · 2025-08-26T11:01:56Z

ragas/src/ragas/metrics/_factual_correctness.py

-        response_claims = await self.decompose_claims(response, callbacks)
-        reference_response = await self.verify_claims(
-            premise=reference, hypothesis_list=response_claims, callbacks=callbacks
+        async def get_empty_response():


Better to make it a class method for reusability

anistark · 2025-08-26T11:02:25Z

ragas/src/ragas/metrics/_factual_correctness.py

@@ -287,5 +293,13 @@ async def _single_turn_ascore(

        return np.round(score, 2)

+    async def decompose_and_verify_claims(


Can we name this something more generic for clarity.

How about get_claim_verification_results ?

vignesh14052002 added 2 commits August 4, 2025 19:53

Optimize factual correctness

0b6c450

format file

9dc1617

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Aug 4, 2025

greptile-apps bot reviewed Aug 4, 2025

View reviewed changes

ragas/src/ragas/metrics/_factual_correctness.py Outdated Show resolved Hide resolved

group import

7fe2e19

anistark requested changes Aug 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize factual correctness metric runtime by 50% #2153

Optimize factual correctness metric runtime by 50% #2153

vignesh14052002 commented Aug 4, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

anistark left a comment

Uh oh!

anistark Aug 26, 2025

Uh oh!

anistark Aug 26, 2025

Uh oh!

Uh oh!

		@@ -287,5 +293,13 @@ async def _single_turn_ascore(

		return np.round(score, 2)

		async def decompose_and_verify_claims(

Optimize factual correctness metric runtime by 50% #2153

Are you sure you want to change the base?

Optimize factual correctness metric runtime by 50% #2153

Conversation

vignesh14052002 commented Aug 4, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

anistark left a comment

Choose a reason for hiding this comment

Uh oh!

anistark Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

anistark Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!