https://github.com/rafaykhattak/captionify
With Captionify, users can upload an image or enter an image URL to generate a descriptive caption that accurately describes the contents of the image.
https://github.com/rafaykhattak/captionify
deep-neural-networks encoder-decoder-model gpt-2 transfer-learning transformer vision-transformer
Last synced: 4 months ago
JSON representation
With Captionify, users can upload an image or enter an image URL to generate a descriptive caption that accurately describes the contents of the image.
- Host: GitHub
- URL: https://github.com/rafaykhattak/captionify
- Owner: RafayKhattak
- Created: 2023-05-04T12:07:07.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-04T14:17:11.000Z (about 2 years ago)
- Last Synced: 2024-11-13T02:32:20.734Z (6 months ago)
- Topics: deep-neural-networks, encoder-decoder-model, gpt-2, transfer-learning, transformer, vision-transformer
- Language: Python
- Homepage: https://rafaykhattak-captionify-app-btvva7.streamlit.app/
- Size: 16.6 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Captionify
Captionify is a web application that generates a descriptive caption for an image using an encoder-decoder architecture. The application uses a pre-trained Transformer-based vision model (ViT) as an encoder and a pre-trained language model (GPT2) as a decoder to generate highly accurate captions for uploaded images or image URLs.

## Usage
To use Captionify, simply upload an image or enter an image URL on the web interface. The tool will then use the pre-trained models to generate a descriptive caption that accurately describes the contents of the image.
## Getting Started
To install Captionify, simply clone this repository and install the necessary dependencies using pip:
```
git clone https://github.com//.git
cd
```
Install the required dependencies using the following command:
```
pip install -r requirements.txt
```
Then run the app.py file using the following command:
```
streamlit run app.py
```
This will launch the application on your local machine. You can then upload an image or enter an image URL to generate a descriptive caption.
## Architecture
Captionify uses an encoder-decoder architecture to generate captions for images. The encoder is a pre-trained Transformer-based vision model (ViT) that encodes the input image into a sequence of feature vectors. The decoder is a pre-trained language model (GPT2) that generates a descriptive caption for the image based on the encoded features.
## Dependencies
- streamlit
- requests
- Pillow
- transformers
- torch
## References
- This project is based on the Encoder-Decoder architecture and uses pre-trained models from the Hugging Face Transformers library.
- The application was developed using Streamlit, an open-source app framework for Machine Learning and Data Science projects.