-
Notifications
You must be signed in to change notification settings - Fork 64
feat: add VLMRenderJudge metric #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @e06084, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the multimodal evaluation capabilities by introducing VLMRenderJudge, a metric designed for visually assessing OCR quality. Unlike traditional text-based comparisons, this metric renders OCR results into images and leverages VLMs to judge their consistency against original images. This approach is particularly beneficial for evaluating complex content like mathematical formulas and structured tables, offering a more human-like assessment of OCR accuracy. The accompanying RenderTool provides the necessary functionality to convert various content types into visual representations for this evaluation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new VLMRenderJudge metric for visually evaluating OCR quality. This is a significant feature that uses a "Render -> Judge" pattern, where OCR output is rendered as an image and compared to the original using a VLM. The implementation includes a new RenderTool for handling text and LaTeX rendering, comprehensive documentation in both English and Chinese, and thorough unit tests. The code is well-structured, but there are a few areas for improvement, particularly in the RenderTool regarding security best practices, portability of LaTeX rendering, and robustness in handling special characters. The documentation also has a minor error in the installation command.
| # Escape special characters in text mode | ||
| # (simplified - full implementation would be more complex) | ||
| return content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _preprocess_latex method currently doesn't escape special LaTeX characters (e.g., _, ^, &, %, $, #, {, }). If the input content is not already a valid LaTeX string and contains these characters, it can cause the xelatex compilation to fail. While the comment acknowledges this is a simplified implementation, it's a potential source of bugs for real-world OCR data. Consider adding basic escaping for common special characters.
|
|
||
| ```bash | ||
| # Basic dependencies | ||
| pip install dingo pillow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ```bash | ||
| # 基础依赖 | ||
| pip install dingo pillow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.