Skip to content

An end-to-end MLOps pipeline to develop, train, and deploy an Image Caption model that automatically generates captions for images based on diverse datasets

License

Notifications You must be signed in to change notification settings

Narius2030/MLOps-Image-Captioning

Repository files navigation

References

📌 Tech Stack

Technology Purpose
Spark Streaming Stream processing for ingesting
Apache Kafka Streaming event and data
Apache Airflow Workflows & scheduling tasks
MinIO & MongoDB Data Storage and Catalog
Trino Federated SQL queries & seamless integration with BI tools
Superset BI tool for business analytics

🖥️ Infrastructure

Resource Specification
VPS OS Ubuntu 24.0.2
CPU 4-core Intel Xeon
GPU ❌ No GPU
RAM 10GB
Storage 200GB SSD
Networking 1Gbps Bandwidth

General Architecture of MLOps

  • Builded a Data Lake following Medallion architecture with catalog layer and storage layer for storing image and its metadata
  • Streamed events from file uploading and captured images from mobile app (was sent by API) into raw storage area, so that it helps data more various for AI training
  • Integrated NLP and Image processings in ETL pipeline to periodically normalize images and metadata

image

Metadata Layer

image

Real-time Monitoring & Scheduling

Monitoring Dashboard for Data Lake

dashboard

Schedule tasks on Airflow

image

FastAPI-based Microservice

More detail in this Repo

  • Query Data Service: Develop an APIs to retrieve metadata and images which were normalized in Data Lake for automated incremental learning process.
  • Model Deploying Service: Develop an APIs to deploy model run on vps, and obtain streaming captured image and metadata from mobile app to data lake for incremental learning.
  • Utilize Nginx to route and load balance among API service containers for reducing the latency and avoiding overload on each service.

image

About

An end-to-end MLOps pipeline to develop, train, and deploy an Image Caption model that automatically generates captions for images based on diverse datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published