https://github.com/mahmood-anaam/violet2
  
  
    Violet is a Python-based library designed for generating Arabic image captions. The pipeline leverages state-of-the-art transformer models, providing an easy-to-use interface for researchers and developers working on tasks such as image captioning and visual question answering (VQA). 
    https://github.com/mahmood-anaam/violet2
  
image-captioning okvqa python3 pytorch transformers vqa vqav2
        Last synced: 8 months ago 
        JSON representation
    
Violet is a Python-based library designed for generating Arabic image captions. The pipeline leverages state-of-the-art transformer models, providing an easy-to-use interface for researchers and developers working on tasks such as image captioning and visual question answering (VQA).
- Host: GitHub
- URL: https://github.com/mahmood-anaam/violet2
- Owner: Mahmood-Anaam
- License: mit
- Created: 2024-12-08T15:59:38.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-01-03T20:10:00.000Z (10 months ago)
- Last Synced: 2025-01-03T20:31:31.501Z (10 months ago)
- Topics: image-captioning, okvqa, python3, pytorch, transformers, vqa, vqav2
- Language: Jupyter Notebook
- Homepage: https://github.com/Mahmood-Anaam/Violet.git
- Size: 12.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- 
            Metadata Files:
            - Readme: README.md
- License: LICENSE
 
Awesome Lists containing this project
README
          # Violet: Arabic Image Captioning
**Violet** is a Python-based library designed for generating **Arabic image captions**. The pipeline leverages state-of-the-art transformer models, providing an easy-to-use interface for researchers and developers working on tasks such as image captioning and visual question answering (VQA).
## Features
1. **Arabic Image Captioning**: Generate high-quality captions for images in Arabic.
2. **Visual Feature Extraction**: Extract image features for integration into vision-language models or downstream tasks.
3. **Customizable for VQA**: Use extracted features and captions to build Arabic visual question-answering systems.
4. **Mixed Input Support**: Handle batches of images in various formats, such as URLs, file paths, NumPy arrays, PyTorch tensors, and PIL Image objects.
## How to Use Violet
### Installation
Clone the repository and install Violet in editable mode:
```bash
!git clone https://github.com/Mahmood-Anaam/Violet.git
%cd Violet
!pip install -e .
```
### Example Usage in Google Colab
Interactive Jupyter notebooks are provided to demonstrate Violet's capabilities. You can open these notebooks in Google Colab:
- [Image Captioning Demo](https://github.com/Mahmood-Anaam/Violet/blob/main/notebooks/inference_demo.ipynb) [](https://colab.research.google.com/github/Mahmood-Anaam/Violet/blob/main/notebooks/inference_demo.ipynb)
- [Feature Extraction Demo](https://github.com/Mahmood-Anaam/Violet/blob/main/notebooks/features_extraction_demo.ipynb) [](https://colab.research.google.com/github/Mahmood-Anaam/Violet/blob/main/notebooks/features_extraction_demo.ipynb)
### Pipeline Overview
The Violet pipeline supports three main functionalities:
1. **Generate Captions for Images**
The pipeline can handle a variety of input formats
   ```python
   
     from violet.pipeline import VioletImageCaptioningPipeline
     from violet.configuration import VioletConfig
  
     pipeline = VioletImageCaptioningPipeline(VioletConfig)
  
    # Single image captioning
    captions = pipeline("http://images.cocodataset.org/val2017/000000039769.jpg")
    print(captions)
  
    # Batch image captioning with mixed formats
    images = [
        "http://images.cocodataset.org/val2017/000000039769.jpg",
        "/path/to/local/image.jpg",
        np.random.rand(224, 224, 3),  # NumPy array
        torch.randn(3, 224, 224),     # PyTorch tensor
        Image.open("/path/to/pil/image.jpg"),  # PIL Image
    ]
   
    captions = pipeline(images)
    for caption in captions:
        print(caption)
      
   ```
2. **Extract Features from Images**
Extract visual features for downstream tasks like VQA. The pipeline supports mixed input formats in a single batch.
   ```python
   # Single image feature extraction
   features = pipeline.generate_features("http://images.cocodataset.org/val2017/000000039769.jpg")
   print(features.shape)
   # Batch feature extraction with mixed formats
   features = pipeline.generate_features(images)
   print(features.shape)
   ```
4. **Generate Captions from Features**
Generate captions based on precomputed visual features.
   ```python
   captions = pipeline.generate_captions_from_features(features)
   for caption in captions:
     print(caption)
   ```
## Contributions
**Violet** is a library for Arabic image captioning and visual feature extraction, designed for tasks like image captioning and visual question answering (VQA). Contributions are welcome on the [GitHub Repository](https://github.com/Mahmood-Anaam/Violet).