This project provides a video similarity detection system based on deep learning and vectr search. It was originally developed as part of the TokenUs platfor, but is released here as a standalone open-source component.
The system ensures video originality by comparing uploaded videos against an index of previously processed content, preventing duplication and enabling downstream applications such as copyright protection and content verification.
- Frame-based feature extraction using a pretrained ResNet-50 model.
- Efficient vector similarity search powered by FAISS.
- Cosine similarity scoring with configurable thresholds. -** AWS S3 integration** for video storage and retrieval.
- REST API(Flask) for easy integration with external services.
- Download video from S3 using its URL.
- Extract frames and compute feature embeddings bia ResNet-50.
- Search FAISS index and claculate cosine similarities.(You can control the Duplicate criteria)
- Return result via REST API and update storage/index accordingly.
git clone https://github.com/Growth-and-Start/TokenUs_ML.git
cd TokenUs_ML
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_S3_BUCKET=
AWS_DEFAULT_REGION=
AWS_ACCOUNT_ID=
MYSQL_ROOT_PASSWORD=
MYSQL_DATABASE=
MYSQL_USER=
MYSQL_PASSWORD=
BACKEND_URL=
API_PATH=
CONTAINER_NAME=
IMAGE_NAME=
NETWORK_NAME=
VOLUME_NAME=
FLASK_PORT=
CPU_LIMIT=
MEMORY_LIMIT=
FAISS_INDEX_PATH=
DOWNLOAD_FOLDER=
FLASK_PORT=
cd docker/deploy
docker-compose up -d
POST /similarity-check
Content-Type: application/json
{
"s3_url": "https://s3.amazonaws.com/bucket/video.mp4"
}
{
"is_duplicate": false,
"max_similarity": 0.62,
"avg_similarity": 0.45
}
MIT License © 2025 TokenUs Team
| Heejae An | Jimin Seo | Wonyoung Kim |
|---|---|---|
| @AnyJae | @SeoJimin1234 | @lasagna10 |