https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus
BiT-ImageCaptioning-Mult-GPUs is a Python package for generating Arabic image captions using Bidirectional Transformers (BiT). This library is designed to provide high-quality and accurate captions for Arabic datasets by leveraging pre-trained deep learning models.
https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus
bertforimagecaptioning berttokenizer conda-environment features-extraction image-captioning pytorch pytorch-transformers
Last synced: 6 months ago
JSON representation
BiT-ImageCaptioning-Mult-GPUs is a Python package for generating Arabic image captions using Bidirectional Transformers (BiT). This library is designed to provide high-quality and accurate captions for Arabic datasets by leveraging pre-trained deep learning models.
- Host: GitHub
- URL: https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus
- Owner: Mahmood-Anaam
- License: mit
- Created: 2024-12-17T18:03:29.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-31T05:07:52.000Z (9 months ago)
- Last Synced: 2024-12-31T05:27:31.633Z (9 months ago)
- Topics: bertforimagecaptioning, berttokenizer, conda-environment, features-extraction, image-captioning, pytorch, pytorch-transformers
- Language: Python
- Homepage: https://github.com/Mahmood-Anaam/BiT-ImageCaptioning-Mult-GPUs.git
- Size: 2.8 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BiT-ImageCaptioning-Mult-GPUs
**BiT-ImageCaptioning-Mult-GPUs** is a Python package for generating **Arabic image captions** using **Bidirectional Transformers (BiT)**. This library is designed to provide high-quality and accurate captions for Arabic datasets by leveraging pre-trained deep learning models.
## Installation
Clone the repository:
```bash
git clone https://github.com/Mahmood-Anaam/BiT-ImageCaptioning-Mult-GPUs.git
cd BiT-ImageCaptioning-Mult-GPUs
```Create .env for environment variables:
```env
HF_TOKEN = "hugging_face_token"
```Create conda environment:
```bash
conda env create -f environment.yml
conda activate sg_benchmark
```Install Scene Graph Detection for feature extraction:
```bash
cd src\scene_graph_benchmark
python setup.py build develop
```Download Image captioning model
```bash
cd ..
git lfs install
git clone https://huggingface.co/jontooy/AraBERT32-Flickr8k bit_image_captioning/pretrained_model
```Install BiT-ImageCaptioning for image captioning:
```bash
cd ..
python setup.py build develop
```## Quick Start
```python
import torch
from bit_image_captioning.feature_extractors.vinvl import VinVLFeatureExtractor
from bit_image_captioning.pipelines.bert_pipeline import BiTImageCaptioningPipeline
from bit_image_captioning.datasets.ok_vqa_dataset import OKVQADataset
from bit_image_captioning.datasets.ok_vqa_dataloader import OKVQADataLoader
from bit_image_captioning.modeling.bert_config import BiTConfig# Config
cfg = BiTConfig
cfg.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
cfg.add_od_labels = True# Extract image features
feature_extractor = VinVLFeatureExtractor(device=cfg.device,add_od_labels=cfg.add_od_labels)
# img # (file path, URL, PIL.Image, numpy array, or tensor)
image_features = feature_extractor([img])
# return List[dict]: List of extracted features for each image.
# [{"boxes","classes","scores","img_feats","od_labels","spatial_features"},]# Generate a caption
pipeline = BiTImageCaptioningPipeline(cfg)
features,captions = pipeline([img])
print("Generated Caption:", caption)
```