How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Introduction of the project and our team

About the project 🧮
- This project has started since Jun 2024. It used to be a multi-image benchmark. However, we found it oversimplified and failed to enable flexible interaction as human players do in the real escape game. We started to design an interactable 3D environment together with the Legent team since Aug 2024.
About the team 👩🏻‍🎓🧑🏻‍🎓🧑🏻‍🎓🧑🏻‍🎓🧑🏻‍🎓🧑🏻‍🎓🧑🏻‍🏫🧑🏻‍🏫
- We are students from THUMT & THUNLP (Tsinghua University) and Fudan University, and we work part-time on this project. (This is why it took so long to release this project.😣)
- As experienced escape game players, we are curious about how MLLMs would perform in such an environment.
- We are currently planning a second version. If you are insterested in our project, feel free to contact us. (✉️email)
- We live to enjoy life, not just to work.

Example of a successful escape

Installation

Install required packages of EscapeCraft as follows:

git clone https://github.com/THUNLP-MT/EscapeCraft.git
cd EscapeCraft
conda create -n mm-escape python=3.11
conda activate mm-escape
pip install -r requirements.txt

Download Legent client and environment

For detailed instructions to install Legent, please follow hugging face or Tsinghua Cloud. After downloading the client and environment, please unzip the file to create the following file structure:

src/
└── .legent/
    └── env/
        ├── client
        │   └── LEGENT-<platform>-<version>
        └── env_data/
            └── env_data-<version>

Please refer to LEGENT if you encounter any issues.

Configuration of EscapeCraft

Our EscapeCraft is extensible and can be customized by modifying configs in src/config.py according to your requirements. Please try our pre-defined settings or customize your own settings follow the instructions below:

Settings of Game Difficulty

For direct usage:
- The MM-Escape benchmark we used in our paper are provided in the levels/ dir.
- Users can directly play with our pre-defined settings.
For customization:
- Please prepare two types of files: the level file and the scene file. Users can refer to the structure of our json files (in levels/ dir) to config your own data.
- For the level file, users should define key props and way to get out (e.g. unlocking the door with the key, or unlocking the door using password)
- For the scene file, users should specify object models used in the scene. If the objects are not included in our repo, please download the required object models and place them under the prefabs/ dir.

Generate a customized scene

cd src/scripts
python generate_scene.py --setting_path path/to/levels

Then the scene will be saved automatically in levels/level_name/.

Load a customized scene to explore manually

cd src/scripts
python load_scene.py --scene_path path/to/levels

Run the game

The options for the evalution are listed as following:

usage: main.py [-h] [--level LEVEL] [--model MODEL] [--room_id ROOM_ID] [--record_path RECORD_PATH] [--history_type HISTORY_TYPE] [--hint]
               [--max_history MAX_HISTORY] [--max_retry MAX_RETRY]

options:
  -h, --help            show this help message and exit
  --level LEVEL         level name
  --model MODEL         model name
  --room_id ROOM_ID
                        generated room_id of level "LEVEL"
  --record_path RECORD_PATH
                        record path to load
  --history_type HISTORY_TYPE
                        history type, asserted in full, key, max
  --hint                whether to use hint
  --max_history MAX_HISTORY
                        max history length (you need to *set history_type to "max"* to enable this max history length setting)
  --max_retry MAX_RETRY
                        max retry times

For example, you can load the third scene generated for level3 (aka "Diffuculty-3" in our paper) and evaluate the model gpt-4o with the history type full:

cd src
python main.py --level level3 --room_id 3 --model gpt-4o --history_type full

To load a recorded history, please follow this command:

cd src
python main.py --level level3 --room_id 3 --model record --history_type full --record_path path/to/record

This is for visualization of a complete escaping history, or to restore a unfinished game (continue running).

Story Recovery & MultiRoom & Extensions

coming soon!

Citation

If you find this repository useful, please cite our paper:

@misc{wang2025multimodallargelanguagemodels,
      title={How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game}, 
      author={Ziyue Wang and Yurui Dong and Fuwen Luo and Minyuan Ruan and Zhili Cheng and Chi Chen and Peng Li and Yang Liu},
      year={2025},
      eprint={2503.10042},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.10042}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
figures		figures
levels		levels
prefabs		prefabs
src		src
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Introduction of the project and our team

Example of a successful escape

Installation

Configuration of EscapeCraft

Settings of Game Difficulty

Generate a customized scene

Load a customized scene to explore manually

Run the game

Story Recovery & MultiRoom & Extensions

Citation

About

Releases

Packages

Contributors 3

Languages

License

THUNLP-MT/EscapeCraft

Folders and files

Latest commit

History

Repository files navigation

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Introduction of the project and our team

Example of a successful escape

Installation

Configuration of EscapeCraft

Settings of Game Difficulty

Generate a customized scene

Load a customized scene to explore manually

Run the game

Story Recovery & MultiRoom & Extensions

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages