Vision Crowd Counting Transformers (ViCCT) for Cross-Scene Crowd Counting

This repository contains the code from the thesis project about crowd counting with fully transformer-based architectures. The following functions are provided:

Standard training of ViCCT and CSRNet
Meta learning with Meta-SGD for:
- CSRNet
- ViCCT
- SineNet (the toy example discussed in the MAML paper and the Meta-SGD paper)

What is Crowd Counting?

As the name suggest, crowd counting involves estimating the number of people in a location. Most modern computer vision methods achieve this with density map regression. That is, given an image of a scene (for example, surveillance footage), predict a density map where each pixel indicated the accumalative density of all people at that point. The ground-truth density is often a Gaussian distribution around each person's head. If multiple persons are close to each other, the distributions overlap and the values are summed. We obtain the total count in a scene by integrating over the whole density map. The following shows an example of a scene, its ground-truth density map, and a model's prediction of this map, given only the image. Note that some of the density mass is outside the image frame due to the Gaussian distribution close to the edge of the image.

What is few-shot learning in the context of scene adaptation with crowd counting?

Few-shot learning means that the model must learn something with only a few training examples. Scene adaptation in crowd counting is that we have a model trained on one or more scens (e.g. one or more surveillance cameras), and that we wish to adjust the model to do crowd counting in a novel scene. Combine the two and you get that a model must adjust to a novel scene with just a few training examples. This is a non-trivial task due to the change in perspective, changes in lighting conditions, changes in people's appearance, changes in background, etc.

Why do we need few-shot learning for scene adaptation?

The standard approach to obtain a model for a novel scene is to manually annotate many images of this scene, usually in the hundreds of images. This is extremely tedious and labour intensive. Should we succeed in obtaining a model that can adapt to new scenes with just a few images, we greatly reduce the required annotation time whenever we place a new camera.

Using this repository:

First of all, the environment used with this project is provided in environment.yml. One can install this environment with 'conda env create -f environment.yml'.

To train any model, specify the parameters for the run in config.py, such as the model to train and the dataset to use. Note that the name for the dataset must match the folder name in datasets/standard for standard training and datasets/meta for meta learning. Dataset specific parameters are set in 'settings.py' in the folder of the corresponding dataset.

Training a model can be done with main_standard.py for standard training and main_meta.py for meta-learning.

Acknowledgements

The code in this repository is heavily inspired by, and uses parts of, the Crowd Counting Code Framework (C^3-Framework). I also use and extend the code from the DeiT repository repository for the ViCCT models.

Code from this repository about MAML in PyTorch is used for 1) our Meta-SGD implementation and 2) the SineNet implementation.

Important papers for this repository:

C^3-Framework paper
DeiT paper
CSRNet paper
Meta-SGD paper

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
datasets		datasets
misc		misc
models		models
notebooks		notebooks
LICENCE		LICENCE
LICENECE_C3Framework		LICENECE_C3Framework
LICENSE_Facebook		LICENSE_Facebook
README.md		README.md
__init__.py		__init__.py
config.py		config.py
environment.yml		environment.yml
main_meta.py		main_meta.py
main_standard.py		main_standard.py
trainer_meta.py		trainer_meta.py
trainer_standard.py		trainer_standard.py
trainer_standard_CSRNet.py		trainer_standard_CSRNet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Vision Crowd Counting Transformers (ViCCT) for Cross-Scene Crowd Counting

What is Crowd Counting?

What is few-shot learning in the context of scene adaptation with crowd counting?

Why do we need few-shot learning for scene adaptation?

Using this repository:

Acknowledgements

About

Licenses found

Releases

Packages

Languages

License

Licenses found

Amsterdam-Internships/FewShotCrowdCounting

Folders and files

Latest commit

History

Repository files navigation

Vision Crowd Counting Transformers (ViCCT) for Cross-Scene Crowd Counting

What is Crowd Counting?

What is few-shot learning in the context of scene adaptation with crowd counting?

Why do we need few-shot learning for scene adaptation?

Using this repository:

Acknowledgements

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages