https://github.com/yasho191/SwiftAnnotate

Auto labelling tool for Text, Image, Video
https://github.com/yasho191/SwiftAnnotate

automation computer-vision data-labeling llms nlp vlms

Last synced: 5 months ago
JSON representation

Auto labelling tool for Text, Image, Video

Host: GitHub
URL: https://github.com/yasho191/SwiftAnnotate
Owner: yasho191
License: apache-2.0
Created: 2025-01-18T02:59:50.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-18T07:04:10.000Z (over 1 year ago)
Last Synced: 2025-11-28T01:34:42.742Z (8 months ago)
Topics: automation, computer-vision, data-labeling, llms, nlp, vlms
Language: Python
Homepage: https://yasho191.github.io/SwiftAnnotate/
Size: 2.26 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-auto-annotation - yasho191/SwiftAnnotate

README

          # SwiftAnnotate 🚀

![swiftannotate](https://github.com/yasho191/SwiftAnnotate/blob/main/assets/swiftannotate-high-resolution-logo.png?raw=True)

SwiftAnnotate is a comprehensive auto-labeling tool designed for Text, Image, and Video data. It leverages state-of-the-art (SOTA) Vision Language Models (VLMs) and Large Language Models (LLMs) through a robust annotator-validator pipeline, ensuring high-quality, grounded annotations while minimizing hallucinations. SwiftAnnotate also supports annotations tasks like Object Detection and Segmentation through SOTA CV models like `SAM2`, `YOLOWorld`, and `OWL-ViT`.

## Key Features 🎯

1. **Text Processing 📝**  

Perform **classification**, **summarization**, and **text generation** with state-of-the-art NLP models. Solve real-world problems like spam detection, sentiment analysis, and content creation.

2. **Image Analysis 🖼️**  

Generate **captions** for images to provide meaningful descriptions. Classify images into predefined categories with high precision. Detect objects in images using models like **YOLOWorld**. Achieve pixel-perfect segmentation with **SAM2** and **OWL-ViT**.  

3. **Video Processing 🎥**  

Generate captions for videos with **frame-level analysis** and **temporal understanding** Understand video content by detecting scenes and actions effortlessly.  

4. **Quality Assurance ✅**  

Use a **two-stage pipeline** for annotation and validation to ensure high data quality. Validate outputs rigorously to maintain reliability before deployment.  

5. **Multi-modal Support 🌐**  

Seamlessly process **text**, **images**, and **videos** within a unified framework. Combine data types for powerful multi-modal insights and applications.  

6. **Customization 🛠️**

Easily extend and adapt the framework to suit specific project needs. Integrate new models and tasks seamlessly with modular architecture.

7. **Developer-Friendly 👩‍💻👨‍💻**

Easy-to-use package and detailed documentation to get started quickly.

## Installation Guide  

To install **SwiftAnnotate** from PyPI and set up the project environment, follow these steps:  

1. **Install from PyPI**  

    Run the following command to install the package directly:  

    ```bash

    pip install swiftannotate

    ```

2. **For Development (Using Poetry)**  

    If you want to contribute or explore the project codebase ensure you have Poetry installed.  Follow the steps given below:

    ```bash

    git clone https://github.com/yasho191/SwiftAnnotate

    cd SwiftAnnotate

    poetry install

    ```

    You're now ready to explore and develop SwiftAnnotate!  

## Annotator-Validator Pipeline for LLMs and VLMs

![Annotation Pipeline](https://github.com/yasho191/SwiftAnnotate/blob/main/assets/SwiftAnnotatePipeline.png?raw=True)

The annotator-validator pipeline ensures high-quality annotations through a two-stage process:

**Stage 1: Annotation**

- Primary LLM/VLM generates initial annotations

- Configurable model selection (OpenAI, Google Gemini, Anthropic, Mistral, Qwen-VL)

**Stage 2: Validation**

- Secondary model validates initial annotations

- Cross-checks for hallucinations and factual accuracy

- Provides confidence scores and correction suggestions

- Option to regenerate annotations if validation fails

- Structured output format for consistency

**Benefits**

- Reduced hallucinations through 2 stage verification

- Higher annotation quality and consistency

- Automated quality control

- Traceable annotation process

The pipeline can be customized with different model combinations and validation thresholds based on specific use cases.

## Supported Modalities and Tasks

### Text

### Images

#### Captioning

Currently, we support OpenAI, Google-Gemini, Ollama, and Qwen2-VL for image captioning. As Qwen2-VL is not yet available on Ollama it is supported through HuggingFace. To get started quickly refer the code snippets shown below.

**OpenAI**

```python

import os

from swiftannotate.image import OpenAIForImageCaptioning

caption_model = "gpt-4o"

validation_model = "gpt-4o-mini"

api_key = ""

BASE_DIR = ""

image_paths = [os.path.join(BASE_DIR, image) for image in os.listdir(BASE_DIR)]

image_captioning_pipeline = OpenAIForImageCaptioning(

    caption_model=caption_model,

    validation_model=validation_model,

    api_key=api_key,

    output_file="image_captioning_output.json"

)

results = image_captioning_pipeline.generate(image_paths=image_paths)

```

**Qwen2-VL**

You can use any version for the Qwen2-VL (7B, 72B) depending on the available resources. vLLM inference is not currently supported but it will be available soon.

```python

import os

from transformers import AutoProcessor, AutoModelForImageTextToText

from transformers import BitsAndBytesConfig

from swiftannotate.image import Qwen2VLForImageCaptioning

# Load the images

BASE_DIR = ""

image_paths = [os.path.join(BASE_DIR, image) for image in os.listdir(BASE_DIR)]

quantization_config = BitsAndBytesConfig(

    load_in_4bit=True,

    bnb_4bit_quant_type="nf4",

    bnb_4bit_compute_dtype="float16",

    bnb_4bit_use_double_quant=True

)

model = AutoModelForImageTextToText.from_pretrained(

    "Qwen/Qwen2-VL-7B-Instruct",

    device_map="auto",

    torch_dtype="auto",

    quantization_config=quantization_config)

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# Load the Caption Model

captioning_pipeline = Qwen2VLForImageCaptioning(

    model = model,

    processor = processor,

    output_file="image_captioning_output.json"

)

results = captioning_pipeline.generate(image_paths)

```

### Videos

## Contributing to SwiftAnnotate 🤝

We welcome contributions to SwiftAnnotate! There are several ways you can help improve the project:

- **Enhanced Prompts**: Contribute better validation and annotation prompts for improved accuracy

- **File Support**: Add support for additional input/output file formats

- **Cloud Integration**: Implement AWS S3 storage support and other cloud services

- **Validation Strategies**: Develop new validation approaches for different annotation tasks

- **Model Support**: Integrate additional LLMs and VLMs

- **Documentation**: Improve guides and examples

Please submit a pull request with your contributions or open an issue to discuss new features.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yasho191/SwiftAnnotate

Awesome Lists containing this project

README