Automatic violence detection (AVD) through video surveillance cameras is a vital endeavor to ensure the safety of the public. By automatically detecting violence in video surveillance, this allows emergency services to be deployed to the scene sooner. However, most of the current research literature on AVD does not investigate the robustness of their model under various noise distortions that may be present in the surveillance video stream. We explore the effect of these distortions on different state-of-the-art (SOTA) AVD deep learning (DL) models and show that their performance degrades as the video stream gets distorted. The authors propose a new AVD DL model that revolves around video data augmentation to ensure high performance even under various video stream noise distortions.
To run the experiments in the notebooks folder, please ensure to change the directories to match your corresponding paths. The primary installation requirement is tensorflow, which can be installed by pip install tensorflow[and-cuda]. We trained using JavrisLabs.ai with 4xA5000 GPUs. To generate the skeleton data we utilized Openpose from the Windows Demo version.
We base all of our training scripts and take inspiration from https://github.com/atmguille/Violence-Detection-With-Human-Skeletons.