🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite

Ruichuan An^*, Sihan Yang^*, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li
Renrui Zhang^†, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang^‡

PKU, CUHK, StepFun, PolyU, MSRA.

^* Equal Contribution ^† Project Leader ^‡ Corresponding Author

🕙 Timeline

2026.05.09: Update the leaderboard, some details about GENIUS and README.md.
2026.02.11: 🌟 Release of the evaluation code and the core test dataset.
TBD: Integration of more model inference scripts.

🏆 Leaderboard

We conduct a comprehensive evaluation of 14 representative open-source, proprietary and newly proposed agentic generation models. The open-source model comprises Qwen-Image-Edit-2511, GLM-Image, FLUX.2-dev, NextStep-1, Emu3.5-Image and Bagel. The proprietary category includes leading commercial models: Nano Banana and its Pro variant, SeeDream series (4.0 & 4.5) and GPT-Image. The agentic generation models includes Mind-Brush and Gen-Searcher. Both of them use Qwen-Image as the backbone.

🚀 Quick Start

1. Download the Test Dataset

The dataset is available across multiple platforms for your convenience:

Hugging Face: GENIUS
Google Drive: Download Link
Baidu Netdisk: Download Link (Password: iek1)

2. Installation & Directory Setup

Clone the repository and prepare your local environment:

git clone https://github.com/arctanxarc/GENIUS.git
cd GENIUS

conda create -n GENIUS python==3.10
conda activate GENIUS

After downloading the dataset, ensure your directory structure matches the following:

./
├── cal_score.py           # Scoring script
├── dataset/               # Test dataset
│   ├── implicit_pattern
│   ├── multi_semantic
│   ├── prior_conflicting
│   ├── symbolic_constraint
│   └── visual_constraint
├── eval_prompt.py         # Prompt management
├── eval.py                # Main evaluation logic
├── eval.sh                # Entry script
├── GENIUS.pdf             # Paper
└── README.md

3. Prepare Model Outputs

Place the images generated by your models into the outputs directory. Organize them using the following hierarchy: outputs/<model_name>/<task_name>/{id}.png.

Important

The {id} must correspond strictly to the id field in test_data.json (Note: IDs are unique identifiers, not necessarily a continuous sequence starting from 0).

Example Structure:

./
./outputs/
└── nanobanana/              # Example: Model Name
    ├── implicit_pattern/
    │   ├── 002.png          # Matches ID=002 in ./dataset/implicit_pattern/test_data.json
    │   ├── 003.png
    │   └── ...
    ├── multi_semantic/
    └── ...

4. Running the Evaluation

Configure your credentials and target models in eval.sh:

Set your API_URL and API_KEY for LMM-as-a-judge.
Define the evaluation scope:

DIMENSIONS=("implicit_pattern" "symbolic_constraint" "visual_constraint" "prior_conflicting" "multi_semantic")
MODELS=("your_model_name")

Execute the evaluation script:

bash eval.sh

📜 License

The dataset and code are released under CC-BY-NC 4.0 and are intended for academic research only. Commercial use is not permitted.

📝 Citation

@misc{an2026geniusgenerativefluidintelligence,
      title={GENIUS: Generative Fluid Intelligence Evaluation Suite}, 
      author={Ruichuan An and Sihan Yang and Ziyu Guo and Wei Dai and Zijun Shen and Haodong Li and Renrui Zhang and Xinyu Wei and Guopeng Li and Wenshan Wu and Wentao Zhang},
      year={2026},
      eprint={2602.11144},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.11144}, 
}

📬 Contact

Issues: https://github.com/arctanxarc/GENIUS/issues
Email: arctanxarc@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite

🕙 Timeline

🏆 Leaderboard

🚀 Quick Start

1. Download the Test Dataset

2. Installation & Directory Setup

3. Prepare Model Outputs

4. Running the Evaluation

📜 License

📝 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
.gitignore		.gitignore
GENIUS.pdf		GENIUS.pdf
README.md		README.md
cal_score.py		cal_score.py
eval.py		eval.py
eval.sh		eval.sh
eval_prompt.py		eval_prompt.py

Folders and files

Latest commit

History

Repository files navigation

🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite

🕙 Timeline

🏆 Leaderboard

🚀 Quick Start

1. Download the Test Dataset

2. Installation & Directory Setup

3. Prepare Model Outputs

4. Running the Evaluation

📜 License

📝 Citation

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages