https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus

BiT-ImageCaptioning-Mult-GPUs is a Python package for generating Arabic image captions using Bidirectional Transformers (BiT). This library is designed to provide high-quality and accurate captions for Arabic datasets by leveraging pre-trained deep learning models.
https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus

bertforimagecaptioning berttokenizer conda-environment features-extraction image-captioning pytorch pytorch-transformers

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus
Owner: Mahmood-Anaam
License: mit
Created: 2024-12-17T18:03:29.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-12-31T05:07:52.000Z (9 months ago)
Last Synced: 2024-12-31T05:27:31.633Z (9 months ago)
Topics: bertforimagecaptioning, berttokenizer, conda-environment, features-extraction, image-captioning, pytorch, pytorch-transformers
Language: Python
Homepage: https://github.com/Mahmood-Anaam/BiT-ImageCaptioning-Mult-GPUs.git
Size: 2.8 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # BiT-ImageCaptioning-Mult-GPUs

**BiT-ImageCaptioning-Mult-GPUs** is a Python package for generating **Arabic image captions** using **Bidirectional Transformers (BiT)**. This library is designed to provide high-quality and accurate captions for Arabic datasets by leveraging pre-trained deep learning models.

## Installation

Clone the repository:

```bash

git clone https://github.com/Mahmood-Anaam/BiT-ImageCaptioning-Mult-GPUs.git

cd BiT-ImageCaptioning-Mult-GPUs

```

Create  .env for environment variables:

```env

HF_TOKEN = "hugging_face_token"

```

Create  conda environment:

```bash

conda env create -f environment.yml

conda activate sg_benchmark

```

Install Scene Graph Detection for feature extraction:

```bash

cd src\scene_graph_benchmark

python setup.py build develop

```

Download Image captioning model

```bash

cd ..

git lfs install

git clone https://huggingface.co/jontooy/AraBERT32-Flickr8k bit_image_captioning/pretrained_model

```

Install BiT-ImageCaptioning for image captioning:

```bash

cd ..

python setup.py build develop

```

## Quick Start

```python

import torch

from bit_image_captioning.feature_extractors.vinvl import VinVLFeatureExtractor

from bit_image_captioning.pipelines.bert_pipeline import BiTImageCaptioningPipeline

from bit_image_captioning.datasets.ok_vqa_dataset import OKVQADataset

from bit_image_captioning.datasets.ok_vqa_dataloader import OKVQADataLoader

from bit_image_captioning.modeling.bert_config import BiTConfig

# Config

cfg = BiTConfig

cfg.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

cfg.add_od_labels = True

# Extract image features

feature_extractor = VinVLFeatureExtractor(device=cfg.device,add_od_labels=cfg.add_od_labels)

# img # (file path, URL, PIL.Image, numpy array, or tensor) 

image_features = feature_extractor([img])

# return List[dict]: List of extracted features for each image.

# [{"boxes","classes","scores","img_feats","od_labels","spatial_features"},]

# Generate a caption

pipeline = BiTImageCaptioningPipeline(cfg)

features,captions = pipeline([img])

print("Generated Caption:", caption)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mahmood-anaam/bit-imagecaptioning-mult-gpus

Awesome Lists containing this project

README