Bug Description
The MMLongBench-Doc benchmark loader uses eval() to parse evidence_pages and evidence_sources fields from dataset items. If the benchmark dataset is tampered with (e.g., man-in-the-middle on download, or a malicious dataset fork), arbitrary Python code will be executed.
Location
OmniSimpleMem/omni_memory/evaluation/benchmarks.py:1035,1043
Reproduction
# If an attacker modifies the benchmark dataset JSON to include:
# {"evidence_pages": "__import__('os').system('id')", ...}
# The eval() on line 1035 will execute: __import__('os').system('id')
# This runs arbitrary system commands
# To trigger, run the benchmark evaluation:
cd OmniSimpleMem
python -c "from omni_memory.evaluation.benchmarks import MMLongBenchDocBenchmark; b = MMLongBenchDocBenchmark('/path/to/malicious_data')"
Impact
Arbitrary code execution when loading benchmark data from untrusted sources.
Suggested Fix
# Replace eval() with ast.literal_eval() which only allows literals (lists, dicts, strings, numbers)
import ast
# Line 1035
evidence_pages = ast.literal_eval(evidence_pages)
# Line 1043
evidence_sources = ast.literal_eval(evidence_sources)
ast.literal_eval() safely parses Python literal expressions without executing arbitrary code.
Found via automated codebase analysis. Happy to submit a PR if confirmed.
Bug Description
The MMLongBench-Doc benchmark loader uses
eval()to parseevidence_pagesandevidence_sourcesfields from dataset items. If the benchmark dataset is tampered with (e.g., man-in-the-middle on download, or a malicious dataset fork), arbitrary Python code will be executed.Location
OmniSimpleMem/omni_memory/evaluation/benchmarks.py:1035,1043Reproduction
Impact
Arbitrary code execution when loading benchmark data from untrusted sources.
Suggested Fix
ast.literal_eval()safely parses Python literal expressions without executing arbitrary code.Found via automated codebase analysis. Happy to submit a PR if confirmed.