https://github.com/narius2030/mlops-image-captioning

An end-to-end MLOps pipeline to develop, train, and deploy an Image Caption model that automatically generates captions for images based on diverse datasets
https://github.com/narius2030/mlops-image-captioning

apache-airflow apache-kafka batch-processing lakehouse mlflow-tracking mlops polars spark-streaming stream-processing

Last synced: 7 months ago
JSON representation

An end-to-end MLOps pipeline to develop, train, and deploy an Image Caption model that automatically generates captions for images based on diverse datasets

Host: GitHub
URL: https://github.com/narius2030/mlops-image-captioning
Owner: Narius2030
Created: 2025-01-27T15:20:22.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-21T08:50:24.000Z (7 months ago)
Last Synced: 2025-03-21T09:33:17.939Z (7 months ago)
Topics: apache-airflow, apache-kafka, batch-processing, lakehouse, mlflow-tracking, mlops, polars, spark-streaming, stream-processing
Language: Python
Homepage:
Size: 2.45 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# General Architecture of Data Lake

- Builded a Data Lake following Medallion architecture with `catalog layer` and `storage layer` for storing image and its metadata
- Streamed events from `file uploading` and `captured images` from mobile app (was sent by API) into raw storage area, so that it helps data more various for AI training
- Integrated NLP and Image processings in ETL pipeline to periodically normalize images and metadata

![image](https://github.com/user-attachments/assets/923a659b-0401-4c68-a28b-704d6db14098)

Real-time Dashboard for ingested data

![eda](https://github.com/user-attachments/assets/e6d79c8e-3637-482f-b988-b6afe52f3627)

## Detailed Architecture

![image](https://github.com/user-attachments/assets/da4c583d-70fd-440d-b2cc-9ab08cf92fd2)

## MLOps Cycle

![image](https://github.com/user-attachments/assets/31effa68-39dc-4c92-860a-074c959b911a)

# FastAPI-based Microservice

> More detail in this [Repo](https://github.com/Narius2030/FastAPI-Microservice-IMCP.git)

- Develop an APIs to retrieve metadata and images which were normalized in Data Lake for automated incremental learning process.
- Develop an APIs to upload captured image and metadata of user to storage system for later usages and then activate model.
- Utilize Nginx to route and load balance among API service containers for **_reducing the latency_** and **_avoiding overload_** on each service.

![image](https://github.com/user-attachments/assets/11163700-dade-444e-8b19-d97bb7083237)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/narius2030/mlops-image-captioning

Awesome Lists containing this project

README