We provide all environment configurations in environment.txt
. To install all packages, you can create a conda environment and install the packages as follows:
conda create -n lemma python=3.8
conda activate lemma
pip install -r environment.txt
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
In our experiments, we used NVIDIA CUDA 11.3 on Ubuntu 20.04. Similar CUDA version should also be acceptable with corresponding version control for torch
and torchvision
.
For data, features and data download, please refer to our website. Within the google drive, you can find features and model checkpoints under features/
and checkpoints/
respectively.
After download, create data/
under the current directory by
$ cd lemma_simple_model
$ mkdir data
$ mkdir data/hcrn_data
Next, put the features, data and checkpoitns to subdirectories as follows:
- download and put
features/video_feature_20.h5
todata/
. - download and put
features/lemma-qa_appearance_feat.h5
,features/lemma-qa_motion_feat.h5
todata/hcrn_data/
. - download
features/video_features.zip
and unzip it to$FEATURE_BASE_PATH
. - download
features/glove.840.300d.pkl
to$GLOVE_PT_PATH
and setglove_pt_path
to$GLOVE_PT_PATH
inpreprocess/generate_glove_matrix.py
. - download and put
data/train_qas.json
,data/test_qas.json
,data/val_qas.json
,data/tagged_qa.json
,data/vid_intervals.json
to$BASE_DATA_DIR
After downloading all data to their correct locations, run the following for preprocessing:
$ chmod a+x PREPROCESS.sh
$ ./PREPROCESS.sh $BASE_DATA_DIR
This script will run the following preprocess for features and texts:
-
$ python preprocess/preprocess_vocab.py
This will generate
lemma-qa_vocab.json
. -
$ python preprocess/mode_qas2mode_qas_encode.py
This will convert {mode}_qas.json,lemma-qa_vocab.json to {mode}_qas_encode.json, answer_set.txt, vocab.txt.
-
$ python preprocess/generate_glove_matrix.py
Before running
PREPROCESS.sh
, please make sure that theglove_pt_path
is correctly set. This script will generateglove.pt
. -
$ python preprocess/generate_char_vocab.py
This script will generate
char_vocab.txt
. -
$ python preprocess/format_mode_qas_encode.py {mode}
Before running the experiments, please make sure that
max_word_len
inpreprocess/format_mode_qas_encode.py
is equal toargs.char_max_len
defined intrain_psac.py
. Similary, make sure thatmax_sentence_len
inpreprocess/format_mode_qas_encode.py
is equal toargs.max_len
intrain_psac.py
,train_linguistic_bert.py
andtrain_visual_bert.py
. -
$ python preprocess/reasoning_types.py
THis will generate
all_reasoning_types.txt
.
To train the model from scratch we provide the following model files:
train_hcrn.py
: HCRN experimenttrain_hga.py
: HGA experimenttrain_hme.py
: HME experimenttrain_linguistic_bert.py
: BERT experimenttrain_psac.py
: PSAC experimenttrain_pure_lstm.py
: LSTM experiment (addtional LSTM and CNN-LSTM experiment)train_visual_bert.py
: VisualBERT experiment
Use the following command and substitute $TRAIN_MODEL_PY
to the model you want to experiment with:
$ python $TRAIN_MODEL_PY --base_data_dir $BASE_DATA_DIR
for models $TRAIN_MODEL_PY
in train_hcrn.py
, train_hme.py
, train_hga.py
(you might also want to change the app_feat_path
, motion_feat_path
and video_feat_path
in these files for adjusting the feature path) and
$ python $TRAIN_MODEL_PY --feature_base_path $FEATURE_BASE_PATH --base_data_dir $BASE_DATA_DIR
for models $TRAIN_MODEL_PY
in train_psac.py
, train_pure_lstm.py
, train_linguistic_bert.py
, train_visual_bert.py
.
For bert-based model, you need to set BertTokenizer_CKPT and BertModel_CKPT for the model to load pretrained model from huggingface.
-
For linguistic_bert, set BertTokenizer_CKPT="bert-base-uncased", BertModel_CKPT="bert-base-uncased".
-
For visual_bert, set BertTokenizer_CKPT="bert-base-uncased", VisualBertModel_CKPT="uclanlp/visualbert-vqa-coco-pre".
To reload checkpoints and only run inference on test_qas, run the following command:
$ python $TRAIN_MODEL_PY --base_data_dir $BASE_DATA_DIR --reload_model_path $RELOAD_MODEL_PATH --test_only 1
for models $TRAIN_MODEL_PY
in train_hcrn.py
, train_hme.py
, train_hga.py
and
$ python $TRAIN_MODEL_PY --feature_base_path $FEATURE_BASE_PATH --base_data_dir $BASE_DATA_DIR --reload_model_path $RELOAD_MODEL_PATH --test_only 1
for models $TRAIN_MODEL_PY
in train_psac.py
, train_pure_lstm.py
, train_linguistic_bert.py
, train_visual_bert.py
.
This code heavily used resources from VisualBERT, HCRN, HGA, HME, PSAC. We thank the authors for open-sourcing their awesome projects.