All user-facing commands go through the repository entry point:
./enacttom/run.sh <command> [options]For full environment and asset setup, see installation.md.
Task generation, judging, and benchmarking use model APIs. Configure keys in the
shell or in a repo-root .env file:
OPENAI_API_KEY=...Run generation after the Habitat setup is complete and the requested external task-generation agent CLI is installed and authenticated.
conda activate enacttom-habitat
./enacttom/run.sh generate --num-tasks 3 --difficulty standard
./enacttom/run.sh generate --num-tasks 3 --difficulty hardnew-scene and generate require real Habitat episodes and fail if Habitat
dependencies or assets are missing.
TASK=path/to/task.json
./enacttom/run.sh validate-task --task "$TASK"
./enacttom/run.sh verify-pddl --task "$TASK"
./enacttom/run.sh verify --task "$TASK"
./enacttom/run.sh judge --task "$TASK"./enacttom/run.sh benchmark \
--tasks-dir data/enacttom/tasks \
--model gpt-5.4-mini \
--num-times 3Repeated benchmark runs report mean pass rate, pass-rate standard deviation,
pass@k, and pass^k for k = --num-times.
This release contains the EnactToM paper pipeline: scene exploration, task generation, validation, PDDL solvability checks, Habitat replay, ToM judging, and agent benchmarking. Supported Habitat presets are the paper-scale 2-, 3-, and 4-agent Spot robot configurations.
Supported task mechanics are room_restriction, limited_bandwidth,
restricted_communication, remote_control, state_mirroring, and
inverse_state.