https://github.com/narius2030/mlops-image-captioning
An end-to-end MLOps pipeline to develop, train, and deploy an Image Caption model that automatically generates captions for images based on diverse datasets
https://github.com/narius2030/mlops-image-captioning
apache-airflow apache-kafka batch-processing lakehouse mlflow-tracking mlops polars spark-streaming stream-processing
Last synced: 7 months ago
JSON representation
An end-to-end MLOps pipeline to develop, train, and deploy an Image Caption model that automatically generates captions for images based on diverse datasets
- Host: GitHub
- URL: https://github.com/narius2030/mlops-image-captioning
- Owner: Narius2030
- Created: 2025-01-27T15:20:22.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-03-21T08:50:24.000Z (7 months ago)
- Last Synced: 2025-03-21T09:33:17.939Z (7 months ago)
- Topics: apache-airflow, apache-kafka, batch-processing, lakehouse, mlflow-tracking, mlops, polars, spark-streaming, stream-processing
- Language: Python
- Homepage:
- Size: 2.45 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# General Architecture of Data Lake
- Builded a Data Lake following Medallion architecture with `catalog layer` and `storage layer` for storing image and its metadata
- Streamed events from `file uploading` and `captured images` from mobile app (was sent by API) into raw storage area, so that it helps data more various for AI training
- Integrated NLP and Image processings in ETL pipeline to periodically normalize images and metadata
Real-time Dashboard for ingested data

## Detailed Architecture

## MLOps Cycle

# FastAPI-based Microservice
> More detail in this [Repo](https://github.com/Narius2030/FastAPI-Microservice-IMCP.git)
- Develop an APIs to retrieve metadata and images which were normalized in Data Lake for automated incremental learning process.
- Develop an APIs to upload captured image and metadata of user to storage system for later usages and then activate model.
- Utilize Nginx to route and load balance among API service containers for **_reducing the latency_** and **_avoiding overload_** on each service.