This document provides a brief intro of the usage of Mask2Former.
Please see Getting Started with Detectron2 for full usage.
- Pick a model and its config file from
model zoo,
for example,
configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
. - We provide
demo.py
that is able to demo builtin configs. Run it with:
cd demo/
python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS /path/to/checkpoint_file
The configs are made for training, therefore we need to specify MODEL.WEIGHTS
to a model from model zoo for evaluation.
This command will run the inference and show visualizations in an OpenCV window.
For details of the command line arguments, see demo.py -h
or look at its source code
to understand its behavior. Some common arguments are:
- To run on your webcam, replace
--input files
with--webcam
. - To run on a video, replace
--input files
with--video-input video.mp4
. - To run on cpu, add
MODEL.DEVICE cpu
after--opts
. - To save outputs to a directory (for images) or a file (for webcam or video), use
--output
.
We provide a script train_net.py
, that is made to train all the configs provided in Mask2Former.
To train a model with "train_net.py", first setup the corresponding datasets following datasets/README.md, then run:
python train_net.py --num-gpus 8 \
--config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
The configs are made for 8-GPU training. Since we use ADAMW optimizer, it is not clear how to scale learning rate with batch size. To train on 1 GPU, you need to figure out learning rate and batch size by yourself:
python train_net.py \
--config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE
To evaluate a model's performance, use
python train_net.py \
--config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
For more options, see python train_net.py -h
.
Please use demo_video/demo.py
for video instance segmentation demo and train_net_video.py
to train
and evaluate video instance segmentation models.
We adjusted the batch sizes, learning rates, iterations and learning rate steps to our hardware resources. See the more in the Model Zoo.
Find the documentation of the CLUSTER_2_FORMER
parameters in the code, e.g. here.
To use Cluster2Former with scribble VIS datasets, set the following configuration parameters:
MODEL.META_ARCHITECTURE: VideoCluster2Former
INPUT.DATASET_MAPPER_NAME: cluster2former_scribble
To use Cluster2Former with the full mask annotated VIS datasets, set the following configuration parameters:
MODEL.META_ARCHITECTURE: VideoCluster2Former
To use MaskCluster2Former (original Mask2Former plus a Similiraty-based Clustering loss) with the full mask annotated VIS datasets, set the following configuration parameters:
MODEL.META_ARCHITECTURE: VideoMaskCluster2Former