https://github.com/quangduy201/captionify
An AI Image Caption Generator
https://github.com/quangduy201/captionify
cnn fastapi huggingface-spaces image-captioning kaggle pytorch rnn
Last synced: 3 months ago
JSON representation
An AI Image Caption Generator
- Host: GitHub
- URL: https://github.com/quangduy201/captionify
- Owner: quangduy201
- License: apache-2.0
- Created: 2024-05-07T12:03:46.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-08-27T20:40:51.000Z (10 months ago)
- Last Synced: 2025-08-28T05:15:53.144Z (10 months ago)
- Topics: cnn, fastapi, huggingface-spaces, image-captioning, kaggle, pytorch, rnn
- Language: TypeScript
- Homepage: https://captionify-app.vercel.app
- Size: 2.25 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Captionify - An AI Image Caption Generator
**Captionify** is a simple web application that generates English captions from images using a deep learning model.
It combines a **FastAPI backend** (for model inference) and a responsive **HTML/CSS/JS frontend**.
> The model is trained on datasets like **Flickr8k** with a **CNN + RNN architecture**.
## Features
- Upload images drag & drop or file browser.
- Generate descriptive English captions from your images.
- Live typing effect for generated captions.
- Trained on Flickr8k dataset with a custom PyTorch model.
- API endpoint `POST /reload-model` to dynamically reload the latest model from Kaggle.
## Setup
### 1. Clone the repository:
```shell
git clone https://github.com/quangduy201/captionify.git
cd captionify
```
### 2. Create and activate a virtual environment:
```shell
# Create a virtual environment named '.venv'
python -m venv .venv
# Activate the virtual environment
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate
```
### 3. Install dependencies:
```shell
pip install -r requirements.txt
```
### 4. Run:
```shell
uvicorn run:app --reload
```
This will:
- Automatically download the latest model from [`Kaggle Model Hub`](https://www.kaggle.com/models/quangduy201/image-captioning).
- Load model + vocabulary into memory.
- Start FastAPI on [`http://localhost:8000`](http://localhost:8000)
### 5. Access the application:
Open a web browser and go to [`http://localhost:8000`](http://localhost:8000) to access the application.
## Train your own model
You can train a custom captioning model using the provided Kaggle notebook [here](https://www.kaggle.com/code/quangduy201/image-captioning-pytorch)
### 1. Open the notebook:
Visit the provided Kaggle link and create a copy of the notebook.
### 2. Choose your suitable dataset:
You can choose which dataset which is the most suitable for your model.
The default dataset is [flickr8k](https://www.kaggle.com/datasets/quangduy201/flickr8k).
### 2. Settings for the notebook's session
You should train the model using GPU T4 x2 or GPU P100
### 3. Run the notebook:
Simply press `Run all` in the notebook to train your custom model.
### 4. Download the trained model:
After training, download the trained model checkpoint (`/kaggle/working/training/output/checkpoint.pth.tar`).
### 5. Place the trained model:
Place the downloaded `checkpoint.pth.tar` file in the `training/output` directory of the repository.
### 6. Improve the trained model (Optional):
If you want to improve the trained model, you can upload your current checkpoint of the model
and use it as an Input of the notebook.
## Dependencies
- fastapi
- uvicorn
- torch
- torchvision
- spacy
- tqdm
- Pillow
- python-multipart
- tensorboard
- kagglehub