Ruichuan An*, Sihan Yang*, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li
Renrui Zhang†, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang‡
PKU, CUHK, StepFun, PolyU, MSRA.
* Equal Contribution † Project Leader ‡ Corresponding Author
📄 Blog | 🚀 Quick Start | 📦 Dataset | 📜 License | 📝 Citation | 📬 Contact
- 2026.05.09: Update the leaderboard, some details about GENIUS and README.md.
- 2026.02.11: 🌟 Release of the evaluation code and the core test dataset.
- TBD: Integration of more model inference scripts.
We conduct a comprehensive evaluation of 14 representative open-source, proprietary and newly proposed agentic generation models. The open-source model comprises Qwen-Image-Edit-2511, GLM-Image, FLUX.2-dev, NextStep-1, Emu3.5-Image and Bagel. The proprietary category includes leading commercial models: Nano Banana and its Pro variant, SeeDream series (4.0 & 4.5) and GPT-Image. The agentic generation models includes Mind-Brush and Gen-Searcher. Both of them use Qwen-Image as the backbone.
The dataset is available across multiple platforms for your convenience:
- Hugging Face: GENIUS
- Google Drive: Download Link
- Baidu Netdisk: Download Link (Password:
iek1)
Clone the repository and prepare your local environment:
git clone https://github.com/arctanxarc/GENIUS.git
cd GENIUSconda create -n GENIUS python==3.10
conda activate GENIUSAfter downloading the dataset, ensure your directory structure matches the following:
./
├── cal_score.py # Scoring script
├── dataset/ # Test dataset
│ ├── implicit_pattern
│ ├── multi_semantic
│ ├── prior_conflicting
│ ├── symbolic_constraint
│ └── visual_constraint
├── eval_prompt.py # Prompt management
├── eval.py # Main evaluation logic
├── eval.sh # Entry script
├── GENIUS.pdf # Paper
└── README.md
Place the images generated by your models into the outputs directory. Organize them using the following hierarchy: outputs/<model_name>/<task_name>/{id}.png.
Important
The {id} must correspond strictly to the id field in test_data.json (Note: IDs are unique identifiers, not necessarily a continuous sequence starting from 0).
Example Structure:
./
./outputs/
└── nanobanana/ # Example: Model Name
├── implicit_pattern/
│ ├── 002.png # Matches ID=002 in ./dataset/implicit_pattern/test_data.json
│ ├── 003.png
│ └── ...
├── multi_semantic/
└── ...
Configure your credentials and target models in eval.sh:
- Set your
API_URLandAPI_KEYfor LMM-as-a-judge. - Define the evaluation scope:
DIMENSIONS=("implicit_pattern" "symbolic_constraint" "visual_constraint" "prior_conflicting" "multi_semantic")
MODELS=("your_model_name")- Execute the evaluation script:
bash eval.shThe dataset and code are released under CC-BY-NC 4.0 and are intended for academic research only. Commercial use is not permitted.
@misc{an2026geniusgenerativefluidintelligence,
title={GENIUS: Generative Fluid Intelligence Evaluation Suite},
author={Ruichuan An and Sihan Yang and Ziyu Guo and Wei Dai and Zijun Shen and Haodong Li and Renrui Zhang and Xinyu Wei and Guopeng Li and Wenshan Wu and Wentao Zhang},
year={2026},
eprint={2602.11144},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.11144},
}
- Issues: https://github.com/arctanxarc/GENIUS/issues
- Email: arctanxarc@gmail.com

