-
Notifications
You must be signed in to change notification settings - Fork 64
fix: update rag example #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @e06084, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of RAG evaluation metrics by preventing crashes when essential fields like 'answer' or 'expected_output' are missing. Instead, it assigns a zero score with specific diagnostic labels, making the evaluation process more resilient. Additionally, the RAG example configuration has been updated to utilize a new embedding model and introduces a more organized structure for embedding service parameters. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the RAG evaluators to gracefully handle cases where the answer or expected_output is missing, returning a score of 0 instead of raising an error. This improves the robustness of the evaluation pipeline. The configuration for embedding models has also been updated for better clarity, and the corresponding example file has been adjusted. My review focuses on improving code maintainability by addressing code duplication. The logic for handling empty inputs is repeated across three different evaluator classes. I've suggested extracting this into a shared helper method. I also pointed out some minor data fetching redundancy.
| # 检查 answer 是否为空 | ||
| raw_data = getattr(input_data, 'raw_data', {}) | ||
| answer = input_data.content or raw_data.get("answer", "") | ||
|
|
||
| if not answer: | ||
| # 如果 answer 为空,直接返回 0 分 | ||
| log.warning("Context Precision 评估: answer 字段为空,直接返回 0 分") | ||
| result = EvalDetail(metric=cls.__name__) | ||
| result.score = 0.0 | ||
| result.status = True | ||
| result.label = ["QUALITY_BAD.CONTEXT_PRECISION_NO_ANSWER"] | ||
| result.reason = ["answer 字段为空,无法评估上下文精度,分数设为 0"] | ||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's significant code duplication in handling empty answer fields across multiple RAG evaluator classes. This same logic block is present in llm_rag_answer_relevancy.py (lines 224-234) and a similar one for expected_output in llm_rag_context_recall.py (lines 237-252).
To improve maintainability and adhere to the DRY principle, consider extracting this logic into a protected helper method in the BaseOpenAI class. This method could handle checking for empty inputs and generating the standard 0-score EvalDetail response.
For example, a helper method could be added to BaseOpenAI:
from typing import Optional
...
@classmethod
def _handle_empty_input(cls, value: str, field_name: str, metric_label: str, reason_msg: str) -> Optional[EvalDetail]:
if not value:
log.warning(f"{cls.__name__} 评估: {field_name} 字段为空,直接返回 0 分")
result = EvalDetail(metric=cls.__name__)
result.score = 0.0
result.status = True
result.label = [metric_label]
result.reason = [reason_msg]
return result
return NoneYou could then use it in the eval methods like this:
# In LLMRAGContextPrecision.eval
raw_data = getattr(input_data, 'raw_data', {})
answer = input_data.content or raw_data.get("answer", "")
if empty_result := cls._handle_empty_input(
answer,
"answer",
"QUALITY_BAD.CONTEXT_PRECISION_NO_ANSWER",
"answer 字段为空,无法评估上下文精度,分数设为 0"
):
return empty_resultThis would make the code cleaner and easier to manage.
| raw_data = getattr(input_data, 'raw_data', {}) | ||
| answer = input_data.content or raw_data.get("answer", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raw_data and answer variables are now being fetched both here in the eval method and again in the build_messages method (lines 163-165), which is called later in this method. This introduces some redundancy.
While the performance impact is likely negligible, to improve code clarity and reduce redundancy, you could consider fetching these values only once in eval and then passing them to build_messages. This would require changing the signature of build_messages to accept answer and question as arguments. This pattern of redundant data fetching is also present in llm_rag_answer_relevancy.py and llm_rag_context_recall.py.
No description provided.