https://github.com/rafaykhattak/captionify

With Captionify, users can upload an image or enter an image URL to generate a descriptive caption that accurately describes the contents of the image.
https://github.com/rafaykhattak/captionify

deep-neural-networks encoder-decoder-model gpt-2 transfer-learning transformer vision-transformer

Last synced: 6 months ago
JSON representation

With Captionify, users can upload an image or enter an image URL to generate a descriptive caption that accurately describes the contents of the image.

Host: GitHub
URL: https://github.com/rafaykhattak/captionify
Owner: RafayKhattak
Created: 2023-05-04T12:07:07.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-05-04T14:17:11.000Z (about 2 years ago)
Last Synced: 2024-11-13T02:32:20.734Z (8 months ago)
Topics: deep-neural-networks, encoder-decoder-model, gpt-2, transfer-learning, transformer, vision-transformer
Language: Python
Homepage: https://rafaykhattak-captionify-app-btvva7.streamlit.app/
Size: 16.6 KB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Captionify
Captionify is a web application that generates a descriptive caption for an image using an encoder-decoder architecture. The application uses a pre-trained Transformer-based vision model (ViT) as an encoder and a pre-trained language model (GPT2) as a decoder to generate highly accurate captions for uploaded images or image URLs.
![Screenshot (437)](https://user-images.githubusercontent.com/90026724/236234966-71693df3-f3a7-45f2-8a90-109c85315d6e.png)
## Usage
To use Captionify, simply upload an image or enter an image URL on the web interface. The tool will then use the pre-trained models to generate a descriptive caption that accurately describes the contents of the image.
## Getting Started
To install Captionify, simply clone this repository and install the necessary dependencies using pip:
```
git clone https://github.com//.git
cd
```
Install the required dependencies using the following command:
```
pip install -r requirements.txt
```
Then run the app.py file using the following command:
```
streamlit run app.py
```
This will launch the application on your local machine. You can then upload an image or enter an image URL to generate a descriptive caption.
## Architecture
Captionify uses an encoder-decoder architecture to generate captions for images. The encoder is a pre-trained Transformer-based vision model (ViT) that encodes the input image into a sequence of feature vectors. The decoder is a pre-trained language model (GPT2) that generates a descriptive caption for the image based on the encoded features.

![vit_architecture](https://user-images.githubusercontent.com/90026724/236233200-745dae6a-569f-4558-9a12-3a56b0b8a872.jpg)
## Dependencies
- streamlit
- requests
- Pillow
- transformers
- torch
## References
- This project is based on the Encoder-Decoder architecture and uses pre-trained models from the Hugging Face Transformers library.
- The application was developed using Streamlit, an open-source app framework for Machine Learning and Data Science projects.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rafaykhattak/captionify

Awesome Lists containing this project

README