【 中文 | English】
Short video platforms are prevalent, producing a large number of videos daily. There exists an issue of creators misappropriating others' videos, necessitating technical means to identify similar videos.
Traditional identification methods typically rely on manual detection and simple applications, such as identifying video titles or descriptions. However, with the generation of massive data, such methods cannot efficiently process and comply with non-circular and non-personal requirements. To address these issues, this project was launched to implement automatic video similarity detection through an efficient unified identification model.
- Video Frame Extraction: Use OpenCV for uniform frame extraction from videos
- Feature Extraction: Utilize pre-trained ResNet50 model to convert each frame into a 1000-dimensional feature vector
- Vector Matrix Calculation: Form matrices from feature vectors of all frames
- Similarity Calculation: Calculate similarity scores between two videos using algorithms like cosine similarity
- Vector Storage: Store video feature vectors in Amazon OpenSearch vector database
- Approximate Retrieval: Use KNN algorithm for efficient vector similarity retrieval
- Secondary Sorting: Perform precise similarity calculation on retrieval results for final ranking
Subscription link: ResNet50 Subscription
- After deployment, remember the SageMaker Endpoint Name;
- Install AWS CDK: Please refer to CDK Installation Guide
- Clone project locally and deploy using CDK:
cd src/cdk
# Example: Specify SageMaker endpoint
cdk deploy --parameters sagemaker_endpoint=ResNet50
- Get API Gateway Endpoint;
- Create OpenSearch index:
curl --location 'https://{{apigateway.endpoint.url}}/create_opensearch_index' \
--header 'Content-Type: text/plain' \
--data '{}'
- Path:
/get_video_vector
- Method: POST
- Request Params:
{
"video_url": "s3://your_bucket/test.mp4"
}
- Response:
{
"video_vectors": {
"image_001": [0.2212321, 0.2212321...],
"image_002": [0.2212321, 0.2212321...],
...
}
}
- Path:
/insert_video_vector
- Method: POST
- Request Params:
{
"video_url": "s3://your_bucket/test.mp4"
}
- Response:
{"result": 136}
- Path:
/search_similarity_videos
- Method: GET
- Request Params:
{
"video_url": "s3://your_bucket/test.mp4",
"size": 10
}
- Response:
{
"videos": [
{
"video_url": "s3://your_bucket/test.mp4",
"score": 0.99
},
{
"video_url": "s3://your_bucket/test.mp4",
"score": 0.99
},
...
]
}
- Path:
/video_similarity
- Method: POST
- Request Params:
{
"video_url_1": "s3://your_bucket/test.mp4",
"video_url_2": "s3://your_bucket/test.mp4"
}
- Response:
{
"score": 0.92
}
-
Why choose ResNet50 model? ResNet50 is a well-proven balanced model with excellent image classification and feature extraction capabilities, suitable for video vectorization tasks.
-
Does OpenSearch support encrypted search? Yes, it supports data security through encrypted channels (HTTPS) and access control features.
-
How to maintain video original files and vector data? Recommend using Amazon S3 for video storage and Lambda for real-time invocation and vector addition.
See CONTRIBUTING for more information.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.