feat: support sciencemetabench #333

shijinpjlab · 2026-01-13T06:22:12Z

No description provided.

gemini-code-assist · 2026-01-13T06:22:30Z

Summary of Changes

Hello @shijinpjlab, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the Dingo framework's evaluation capabilities by integrating the ScienceMetaBench dataset. It introduces a structured approach to assess the quality of extracted metadata across various document types, offering flexible result storage and advanced reporting features. The changes aim to provide users with powerful tools for benchmarking and analyzing metadata extraction performance.

Highlights

ScienceMetaBench Integration: Introduced comprehensive support for evaluating metadata extraction accuracy using the ScienceMetaBench dataset, including specific rules for academic papers, ebooks, and textbooks.
Configurable Result Merging: Added a new merge option to result saving configurations, allowing all evaluation outputs to be consolidated into a single JSONL file for easier processing.
Excel Export Utility: Implemented a utility function to export detailed and summarized evaluation results to an Excel file, providing both field-level and overall accuracy statistics.
String Similarity Algorithm: Developed a robust string similarity function, string_similarity, which handles null values, case-insensitivity, and uses SequenceMatcher for accurate comparisons.
Documentation and Examples: Provided new English and Chinese documentation, along with an example script, to guide users on how to leverage the ScienceMetaBench evaluation capabilities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

此拉取请求增加了对 ScienceMetaBench 基准测试的支持，包括新的评估规则、数据处理和结果导出功能。代码结构清晰，并包含了完善的文档和测试。我的审查主要关注于提高代码的正确性和可维护性。主要建议包括：修复结果保存逻辑中的一个错误，重构评估规则中重复的代码，以及增强 Excel 导出功能的稳健性。此外，我还指出了文档中的一个拼写错误，并建议改进测试套件以覆盖更多边界情况。

dingo/model/rule/rule_sciencemetabench.py

gemini-code-assist · 2026-01-13T06:25:00Z

dingo/model/rule/rule_sciencemetabench.py

+        eval_details = dingo_result.get('eval_details', {})
+        default_details = eval_details.get('default', [])
+
+        # 获取相似度字典
+        similarity_dict = {}
+        if default_details and len(default_details) > 0:
+            reason_list = default_details[0].get('reason', [])
+            if reason_list and len(reason_list) > 0:
+                similarity_dict = reason_list[0].get('similarity', {})


从 dingo_result 中提取相似度字典时，代码硬编码查找 eval_details 中的 'default' 键。然而，当评估配置中指定了 fields 时，这个键会是字段名的组合，而不是 'default'。这会导致在这些情况下无法提取相似度数据。建议修改此逻辑，使其能通用地从 eval_details 的值中查找相似度字典。

Suggested change

eval_details = dingo_result.get('eval_details', {})

default_details = eval_details.get('default', [])

# 获取相似度字典

similarity_dict = {}

if default_details and len(default_details) > 0:

reason_list = default_details[0].get('reason', [])

if reason_list and len(reason_list) > 0:

similarity_dict = reason_list[0].get('similarity', {})

eval_details = dingo_result.get('eval_details', {})

# 获取相似度字典

similarity_dict = {}

for details_list in eval_details.values():

if details_list and len(details_list) > 0:

reason_list = details_list[0].get('reason', [])

if reason_list and len(reason_list) > 0 and 'similarity' in reason_list[0]:

similarity_dict = reason_list[0].get('similarity', {})

break # 找到相似度字典后即可退出循环

gemini-code-assist · 2026-01-13T06:25:00Z

dingo/exec/local.py

+                # if input_args.executor.result_save.raw:
+                #     str_json = json.dumps(result_info.to_raw_dict(), ensure_ascii=False)
+                # else:
+                #     str_json = json.dumps(result_info.to_dict(), ensure_ascii=False)
+                str_json = json.dumps(result_info.to_raw_dict(), ensure_ascii=False)


在 merge 模式下，当前代码硬编码使用 result_info.to_raw_dict()，忽略了 input_args.executor.result_save.raw 配置。这与非合并模式下的行为不一致。建议恢复被注释掉的逻辑，以正确处理 raw 标志。

Suggested change

# if input_args.executor.result_save.raw:

# str_json = json.dumps(result_info.to_raw_dict(), ensure_ascii=False)

# else:

# str_json = json.dumps(result_info.to_dict(), ensure_ascii=False)

str_json = json.dumps(result_info.to_raw_dict(), ensure_ascii=False)

if input_args.executor.result_save.raw:

str_json = json.dumps(result_info.to_raw_dict(), ensure_ascii=False)

else:

str_json = json.dumps(result_info.to_dict(), ensure_ascii=False)

gemini-code-assist · 2026-01-13T06:25:00Z

dingo/model/rule/rule_sciencemetabench.py

+class RuleMetadataMatchTextbook(RuleMetadataMatchBase):
+    """
+    检查教科书(Textbook)元数据字段的相似度匹配
+
+    比较 benchmark 和 product 字段中的各个子字段，包括:
+    isbn, title, author, abstract, category, pub_time, publisher
+
+    阈值为 0.6，只有所有字段的相似度都达到阈值才算通过
+    """
+
+    _metric_info = {
+        "category": "Rule-Based Metadata Quality Metrics",
+        "quality_dimension": "EFFECTIVENESS",
+        "metric_name": "RuleMetadataMatchTextbook",
+        "description": "检查教科书元数据字段与基准数据的相似度匹配，阈值为0.6",
+    }
+
+    dynamic_config = EvaluatorRuleArgs(
+        key_list=['isbn', 'title', 'author', 'abstract', 'category', 'pub_time', 'publisher'],
+        threshold=0.6
+    )


RuleMetadataMatchTextbook 和 RuleMetadataMatchEbook 的实现几乎完全相同，特别是 dynamic_config。为了减少代码重复并提高可维护性，可以考虑为它们创建一个共同的基类。

例如:

class RuleMetadataMatchBookBase(RuleMetadataMatchBase): """电子书和教科书元数据匹配的基类""" dynamic_config = EvaluatorRuleArgs( key_list=['isbn', 'title', 'author', 'abstract', 'category', 'pub_time', 'publisher'], threshold=0.6 ) @Model.rule_register("QUALITY_BAD_EFFECTIVENESS", ["sciencemetabench"]) class RuleMetadataMatchEbook(RuleMetadataMatchBookBase): _metric_info = { "category": "Rule-Based Metadata Quality Metrics", "quality_dimension": "EFFECTIVENESS", "metric_name": "RuleMetadataMatchEbook", "description": "检查电子书元数据字段与基准数据的相似度匹配，阈值为0.6", } @Model.rule_register("QUALITY_BAD_EFFECTIVENESS", ["sciencemetabench"]) class RuleMetadataMatchTextbook(RuleMetadataMatchBookBase): _metric_info = { "category": "Rule-Based Metadata Quality Metrics", "quality_dimension": "EFFECTIVENESS", "metric_name": "RuleMetadataMatchTextbook", "description": "检查教科书元数据字段与基准数据的相似度匹配，阈值为0.6", }

gemini-code-assist · 2026-01-13T06:25:00Z

docs/sciencemetabench/README.md

+
+- **`benchmark`**: Ground truth (standard answer)
+  - **Source**: Obtained from [ScienceMetaBench Dataset](https://huggingface.co/datasets/opendatalab/ScienceMetaBench)
+  - **Included Fields**：


这里的冒号是全角字符（：），在英文文档中应使用半角冒号（:）。

Suggested change

- **Included Fields**：

- **Included Fields**:

gemini-code-assist · 2026-01-13T06:25:00Z

test/scripts/model/rule/test_rule_sciencemetabench.py

+class TestWriteSimilarityToExcel:
+    """测试 write_similarity_to_excel 函数"""
+
+    @pytest.fixture
+    def temp_output_dir(self):
+        """创建临时输出目录"""
+        temp_dir = tempfile.mkdtemp()
+        yield temp_dir
+        # 清理
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+    @pytest.fixture
+    def sample_paper_data(self, temp_output_dir):
+        """创建示例 paper 数据"""
+        data = [
+            {
+                "sha256": "test001",
+                "benchmark": {
+                    "doi": "10.1234/test001",
+                    "title": "Test Paper 1",
+                    "author": "Author 1",
+                    "keyword": "keyword1",
+                    "abstract": "Abstract 1",
+                    "pub_time": "2024"
+                },
+                "product": {
+                    "doi": "10.1234/test001",
+                    "title": "Test Paper 1",
+                    "author": "Author 1",
+                    "keyword": "keyword1",
+                    "abstract": "Abstract 1",
+                    "pub_time": "2024"
+                },
+                "dingo_result": {
+                    "eval_status": True,
+                    "eval_details": {
+                        "default": [
+                            {
+                                "metric": "RuleMetadataMatchPaper",
+                                "status": True,
+                                "label": ["QUALITY_GOOD"],
+                                "reason": [
+                                    {
+                                        "similarity": {
+                                            "doi": 1.0,
+                                            "title": 1.0,
+                                            "author": 1.0,
+                                            "keyword": 1.0,
+                                            "abstract": 1.0,
+                                            "pub_time": 1.0
+                                        }
+                                    }
+                                ]
+                            }
+                        ]
+                    }
+                }
+            },
+            {
+                "sha256": "test002",
+                "benchmark": {
+                    "doi": "10.1234/test002",
+                    "title": "Test Paper 2",
+                    "author": "Author 2",
+                    "keyword": "keyword2",
+                    "abstract": "Abstract 2",
+                    "pub_time": "2024"
+                },
+                "product": {
+                    "doi": "",
+                    "title": "Different Title",
+                    "author": "Author 2",
+                    "keyword": "keyword2",
+                    "abstract": "Different Abstract",
+                    "pub_time": "2024"
+                },
+                "dingo_result": {
+                    "eval_status": True,
+                    "eval_details": {
+                        "default": [
+                            {
+                                "metric": "RuleMetadataMatchPaper",
+                                "status": True,
+                                "label": ["QUALITY_BAD_EFFECTIVENESS.RuleMetadataMatchPaper.doi"],
+                                "reason": [
+                                    {
+                                        "similarity": {
+                                            "doi": 0.0,
+                                            "title": 0.5,
+                                            "author": 1.0,
+                                            "keyword": 1.0,
+                                            "abstract": 0.45,
+                                            "pub_time": 1.0
+                                        }
+                                    }
+                                ]
+                            }
+                        ]
+                    }
+                }
+            }
+        ]
+
+        # 写入 jsonl 文件
+        jsonl_file = Path(temp_output_dir) / "test_result.jsonl"
+        with open(jsonl_file, 'w', encoding='utf-8') as f:
+            for item in data:
+                f.write(json.dumps(item, ensure_ascii=False) + '\n')
+
+        return temp_output_dir
+
+    def test_write_paper_excel(self, sample_paper_data):
+        """测试导出 paper 类型的 Excel"""
+        output_filename = "test_paper.xlsx"
+
+        df = write_similarity_to_excel(
+            type='paper',
+            output_dir=sample_paper_data,
+            output_filename=output_filename
+        )
+
+        # 验证返回的 DataFrame
+        assert df is not None
+        assert len(df) == 2
+        assert 'sha256' in df.columns
+
+        # 验证所有 paper 字段都存在
+        for field in ['doi', 'title', 'author', 'keyword', 'abstract', 'pub_time']:
+            assert f'benchmark_{field}' in df.columns
+            assert f'product_{field}' in df.columns
+            assert f'similarity_{field}' in df.columns
+
+        # 验证 Excel 文件是否创建
+        excel_file = Path(sample_paper_data) / output_filename
+        assert excel_file.exists()
+
+        # 读取 Excel 验证内容
+        df_from_excel = pd.read_excel(excel_file, sheet_name='相似度分析')
+        assert len(df_from_excel) == 2
+
+        # 验证汇总统计表
+        df_summary = pd.read_excel(excel_file, sheet_name='汇总统计')
+        assert len(df_summary) == 7  # 6个字段 + 1个总体准确率
+        assert '字段' in df_summary.columns
+        assert '平均相似度' in df_summary.columns
+        assert df_summary.iloc[-1]['字段'] == '总体准确率'
+
+    def test_invalid_type(self, temp_output_dir):
+        """测试无效的数据类型"""
+        with pytest.raises(ValueError, match="不支持的数据类型"):
+            write_similarity_to_excel(
+                type='invalid_type',
+                output_dir=temp_output_dir
+            )
+
+    def test_nonexistent_directory(self):
+        """测试不存在的目录"""
+        with pytest.raises(ValueError, match="输出目录不存在"):
+            write_similarity_to_excel(
+                type='paper',
+                output_dir='/nonexistent/directory'
+            )
+
+    def test_no_jsonl_files(self, temp_output_dir):
+        """测试目录中没有 jsonl 文件"""
+        with pytest.raises(ValueError, match="未找到任何.jsonl文件"):
+            write_similarity_to_excel(
+                type='paper',
+                output_dir=temp_output_dir
+            )
+
+    def test_default_filename(self, sample_paper_data):
+        """测试默认文件名生成"""
+        write_similarity_to_excel(
+            type='paper',
+            output_dir=sample_paper_data
+        )
+
+        # 查找生成的文件
+        output_path = Path(sample_paper_data)
+        excel_files = list(output_path.glob("similarity_paper_*.xlsx"))
+        assert len(excel_files) > 0
+
+    def test_data_sorting(self, sample_paper_data):
+        """测试数据按 sha256 排序"""
+        df = write_similarity_to_excel(
+            type='paper',
+            output_dir=sample_paper_data,
+            output_filename="test_sorted.xlsx"
+        )
+
+        # 验证排序
+        sha256_list = df['sha256'].tolist()
+        assert sha256_list == sorted(sha256_list)
+
+    def test_all_string_type(self, sample_paper_data):
+        """测试所有列都是字符串类型"""
+        df = write_similarity_to_excel(
+            type='paper',
+            output_dir=sample_paper_data,
+            output_filename="test_types.xlsx"
+        )
+
+        # 验证所有列都是字符串类型
+        for col in df.columns:
+            assert df[col].dtype == 'object'  # pandas 中字符串类型显示为 object
+
+
+if __name__ == '__main__':
+    pytest.main([__file__, '-v', '--tb=short'])


当前的测试用例覆盖了很多场景，做得很好。但是，有两个在 write_similarity_to_excel 函数中发现的潜在问题没有被测试覆盖：

递归文件搜索: 测试没有验证函数是否能递归地在子目录中查找 .jsonl 文件。当前的 sample_paper_data fixture 只在顶层目录创建文件。建议增加一个测试用例，在子目录中创建 .jsonl 文件，并验证 rglob 是否能正确找到它。

eval_details 的键: 测试数据中 eval_details 的键被硬编码为 'default'。write_similarity_to_excel 函数中也硬编码了对 'default' 的查找，这可能是一个bug。建议增加一个测试用例，使用一个不同于 'default' 的键（例如，当评估配置中指定了 fields 时），来确保函数在这种情况下也能正常工作。

这将帮助确保修复后的代码的健壮性。

shijinpjlab added 13 commits January 13, 2026 10:25

feat: test data

23835c9

feat: RuleMetadataMatchBase

76ddc21

feat: all_in_one参数

d56c294

feat: example

11ae3e5

feat: add excel

6a5bb11

feat: sha256升序排序

fb0a269

feat: excel添加汇总信息页

720d52f

feat: readme

f975ca8

feat: link

1d412d1

feat: 单元测试

8d1d596

feat: fix lint

1090595

feat: all_in_one 改名 merge

e162396

feat: 使用result.output_path

70a08c4

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

feat: 设计新版输入数据结构

158721c

shijinpjlab force-pushed the dev_0112 branch from 1e1cc1f to 158721c Compare January 14, 2026 07:32

feat: 添加2个example

75d7a20

shijinpjlab merged commit 320b44f into MigoXLab:dev Jan 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support sciencemetabench #333

feat: support sciencemetabench #333

Uh oh!

shijinpjlab commented Jan 13, 2026

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support sciencemetabench #333

feat: support sciencemetabench #333

Uh oh!

Conversation

shijinpjlab commented Jan 13, 2026

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant