BOLLD employs a multi-modal approach, integrating body language analysis, lip transcriptions, and reinforcement learning to detect threats in real-time using computer vision and natural language processing.
- Nicole Sorokin (Team Lead)
- Julia Brzustowski (Team Member)
- Zuhair Qureshi (Team Member)
- Grady Rueffer (Team Member)
- Sophia Shantharupan (Team Member)
- Detecting potential threats or violent language when audio is corrupted or unavailable during meetings. π
β οΈ - Aimed at enhancing safety and providing an alternative threat detection system that doesn't rely on sound. π₯π
- Can be applied in public safety scenarios, such as campus surveillance, to alert authorities of potential threats in real-time. ππ¨
- Assistive technology: Can be implemented in glasses with cameras to help people with disabilities, like blindness, by notifying them of potential threats they might not visually perceive. ππ€π
Run the app using the following command
streamlit run app.py
Currently the app.py contains the body language code training and details about which can be found in the body_lang_decoder folder, the lip transcription component details about the model can be found in the lip_to_text folder where each key word is compared to a list of threatening words and the threat level is calculated. The threat level is then used to determine the state of the system. Passed into the Q-Learning table, the state is used to determine the action to take using reinforcement learning.
- Use a trained body language model πΊ and lip reading (via Mediapipe landmarks) π to compute a numerical threat probability (0-1) for each.
- Combine both values to get a combined threat score π’.
- Based on the two inputs from the first stage, train a reinforcement learning model π€ to recognize sequences of actions and lip movements that suggest malicious behavior.
- Output: 0 β Non-malicious, 1 β Malicious, and a scale (0-1) representing the threat level of key words (0 = non-threatening, 1 = threatening).
- The model will influence the environment state π:
- De-escalate if the threat is correctly identified ποΈ.
- All clear! if the threat is incorrectly identified π¨.
- Using the EMOLIPS model (CNN-LSTM) to detect emotions from lip movement based on face details. ππ
- Negative emotions (e.g. anger, disgust) π₯΄ can assist in identifying potential threats.
β οΈ - Oct 27: Shifted to a facial emotion recognition model using DeepFace due to better performance. π§βπ¨
- Integrating body language into a threat vs. non-threat classification using Mediapipe. π§βπ» The model trains on coordinates from landmarks in frames with associated labels.
- Jan 13: Decided to use one body language model (Mediapipe) after facing multiprocessing conflicts with running two models simultaneously (initial goal was to get an average). π€β
- Closely following the methods of LipNet, as it's proven and well-documented. π
- Methodology: Uses Dlib for facial landmark detection, preprocessing the GRID dataset, followed by a CNN architecture with bidirectional GRUs. CTC training used for model optimization. π
- Jan 13: Switching models as the previous one couldnβt handle live video streams. Transitioning to a more suitable approach (e.g., Whisper model) to transcribe lip movement to text, then applying custom models to detect violence levels. π»
- Jan 21: Exploring a new technique using lip/mouth landmarks to detect phonemes and then identify key words stored in a dictionary with associated threat levels. π
- Jan 27: Enhanced LipNet model to process live video streams π₯ and detect mouth region with Dlib + ShapePredictor68.
- Jan 29: Added algorithm to detect key words and produce a violence value. π
- Jan 31: Integrated into app.py. π
- π Project Kickoff: Setup environment and tools
- π₯ Task Assignment
- π― Define goals and objectives
- π Data exploration and preparation
- π Create basic frontend & backend
- π₯ Set up OpenCV for video processing
- π Split into lip reading and reinforcement learning (RL) stages
- π€ Research different models and methods for both stages
- π» Start implementation
- β Finish body language part of stage 1
- π± Set up RL environment
- π Finish preprocessing for lip to text part of stage 1
- π Continue implementation of lip to text training
- π Finish training lip to text part of stage 1
- π Complete RL stage 2
- π₯ Create a demo video
- π Connect stage 1 and 2
- π§ Continue reinforcement learning model training
- π Frontend & Backend integration with ML scripts
- β Finalize body language model
- β Finalize lip to text model
- π§ Continue working on RL
- π§ Finish lip to text model
- π Integrate lip to text into the main app.py
- β¨ Final touches
- βοΈ Improve accuracy and fine-tuning
- π₯οΈ Test the model with webcam integration
