This repository contains the source code for webRTC based video conference platform compatible with Computer Vision Python APIs.
- Real-time video chat using WebRTC and Python APIs connected through frame capture, socket.io, and XMLhttpRequests.
- Object detection and pose estimation using Google MediaPipe and a custom Keras model
- Cross-platform support (tested on Chrome, Firefox, and Safari)
- Node.js (>= 14.x.x)
- npm (>= 6.x.x)
- Python (>= 3.6)
- Flask
- ngrok (optional)
- run python Flask
/computer_vision/server.py
- Move to the
webRTC
folder, install dependencies, and start the development server:cd webRTC
npm install
npm start
- Expose the server using ngrok (optional):
ngrok http 3012
- Alternatively, you can use
https://localhost:3012
in Google Chrome. - Check for the
http /image 200
status code.
This project is designed to provide real-time video chat with integrated computer vision capabilities. The key components and their interactions are described below:
-
Browser Execution: The
webrtc/public/
directory contains the frontend components executed in the browser. Thejs/objDetect.js
file plays a crucial role by extracting video frames and sending them to the server for processing. -
Image Transmission: The
postFile
function in thewebrtc/public/js/objDetect.js
file sends JPEG images to the Flask server athttp://127.0.0.1:5000/image
. -
Performance: The application's performance depends on the hardware. It achieves approximately 70fps on an M1 Apple Silicon and around 15fps on an Intel i5.
-
Server-side Processing: The Flask server, located in the
/flask/server.py
file, listens for POST method requests. Upon receiving an image, it forwards it to theobject_detection_api.py
file for further processing, including object detection and pose estimation. -
Result Communication: After processing the image, the server can send the results back as JSON data to the JavaScript code for display in the browser. Alternatively, the data can be transmitted directly to a teacher using HTTP or Socket.IO.
- Grant permission to access your camera and microphone when prompted.
- Share the generated URL with another participant to establish a video chat connection.
- Enjoy real-time object detection and pose estimation during the video chat.
We welcome contributions! If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Commit your changes and push to your fork.
- Create a pull request with a clear description of your changes.
This project is licensed under the MIT License. See LICENSE file for details.