Corresponding author: Yichi Zhang
Paper | Project Page | Dataset | Model
- Clone the repo
git clone https://github.com/pro-assist/ProAssist.git
cd ProAssist- (Optional) Create a virtual environment
conda create -n mm python=3.10 -y
conda activate mm- Install dependencies
pip install -r requirements.txt
pip install -e .- Set the data root dir in
mmassist/configs/arguments.py, or exportDATA_ROOT_DIRin your environment.
export DATA_ROOT_DIR=<your_data_root_dir>
- Download the preprocessed data:
git lfs install
git clone https://huggingface.co/594zyc/ProAssist-Dataset
mv ProAssist-Dataset/processed_data $DATA_ROOT_DIR/processed_data
Note: the preprocessed data is 152 GB with many files, so it is slow to download. To download a subset of the data for preview, you can use the following command:
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/594zyc/ProAssist-Dataset
git lfs pull -I "processed_data/wtag" # will only download the wtag subset
- Unzip the data:
for dataset in ego4d holoassist epickitchens egoexolearn wtag assembly101; do
cd $DATA_ROOT_DIR/processed_data/$dataset
unzip generated_dialogs.zip
unzip prepared.zip
done
If you want to prepare the data from scratch using the LLM-based data generation pipeline, please see here.
cd $DATA_ROOT_DIR
mkdir -p models && cd models
# download the I=1 model (1 token per frame)
git clone https://huggingface.co/594zyc/ProAssist-Model-L4096-I1
# download the I=5 model (5 tokens per frame)
git clone https://huggingface.co/594zyc/ProAssist-Model-L4096-I5
# download the I=10 model (10 tokens per frame)
git clone https://huggingface.co/594zyc/ProAssist-Model-L4096-I10
We provide several notebooks to demonstrate:
- Video and dialogue visualization (link)
- Model inference for streaming video-to-dialogue generation (link)
- LLM-based dialogue generation pipeline (link)
- LLM-as-a-judge evaluation (link)
- Dataset statistics overview (link)
This repo includes a configurable DST generator that produces structured DST JSON for dataset items using LLMs (single or batch mode). See PROJECT_CONTEXT.md for a concise project summary and run instructions. The recommended entrypoint is the runner script which activates the repository venv and runs the Hydra-driven generator:
bash custom/runner/run_dst_generator.shNote: the training and evaluation scripts only work with the slurm cluster currently.
# Train the I=1, 5, 10 model (I=#tokens/frame)
sbatch scripts/train/I1_8n_4096_1s.sh
sbatch scripts/train/I5_12n_4096_1s.sh
sbatch scripts/train/I10_16n_4096_1s.sh
# Evaluate a trained model
sbatch scripts/eval/Aug_eval_stream.sh
Please consider citing our paper if you find this project helpful for your research:
@article{zhang2025proactive,
title={Proactive Assistant Dialogue Generation from Streaming Egocentric Videos},
author={Zhang, Yichi and Dong, Xin Luna and Lin, Zhaojiang and Madotto, Andrea and Kumar, Anuj and Damavandi, Babak and Chai, Joyce and Moon, Seungwhan},
journal={arXiv preprint arXiv:2506.05904},
year={2025}
}