https://github.com/yasho191/SwiftAnnotate
Auto labelling tool for Text, Image, Video
https://github.com/yasho191/SwiftAnnotate
automation computer-vision data-labeling llms nlp vlms
Last synced: 4 months ago
JSON representation
Auto labelling tool for Text, Image, Video
- Host: GitHub
- URL: https://github.com/yasho191/SwiftAnnotate
- Owner: yasho191
- License: apache-2.0
- Created: 2025-01-18T02:59:50.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-18T07:04:10.000Z (over 1 year ago)
- Last Synced: 2025-11-28T01:34:42.742Z (7 months ago)
- Topics: automation, computer-vision, data-labeling, llms, nlp, vlms
- Language: Python
- Homepage: https://yasho191.github.io/SwiftAnnotate/
- Size: 2.26 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-auto-annotation - yasho191/SwiftAnnotate
README
# SwiftAnnotate 🚀

SwiftAnnotate is a comprehensive auto-labeling tool designed for Text, Image, and Video data. It leverages state-of-the-art (SOTA) Vision Language Models (VLMs) and Large Language Models (LLMs) through a robust annotator-validator pipeline, ensuring high-quality, grounded annotations while minimizing hallucinations. SwiftAnnotate also supports annotations tasks like Object Detection and Segmentation through SOTA CV models like `SAM2`, `YOLOWorld`, and `OWL-ViT`.
## Key Features 🎯
1. **Text Processing 📝**
Perform **classification**, **summarization**, and **text generation** with state-of-the-art NLP models. Solve real-world problems like spam detection, sentiment analysis, and content creation.
2. **Image Analysis 🖼️**
Generate **captions** for images to provide meaningful descriptions. Classify images into predefined categories with high precision. Detect objects in images using models like **YOLOWorld**. Achieve pixel-perfect segmentation with **SAM2** and **OWL-ViT**.
3. **Video Processing 🎥**
Generate captions for videos with **frame-level analysis** and **temporal understanding** Understand video content by detecting scenes and actions effortlessly.
4. **Quality Assurance ✅**
Use a **two-stage pipeline** for annotation and validation to ensure high data quality. Validate outputs rigorously to maintain reliability before deployment.
5. **Multi-modal Support 🌐**
Seamlessly process **text**, **images**, and **videos** within a unified framework. Combine data types for powerful multi-modal insights and applications.
6. **Customization 🛠️**
Easily extend and adapt the framework to suit specific project needs. Integrate new models and tasks seamlessly with modular architecture.
7. **Developer-Friendly 👩💻👨💻**
Easy-to-use package and detailed documentation to get started quickly.
## Installation Guide
To install **SwiftAnnotate** from PyPI and set up the project environment, follow these steps:
1. **Install from PyPI**
Run the following command to install the package directly:
```bash
pip install swiftannotate
```
2. **For Development (Using Poetry)**
If you want to contribute or explore the project codebase ensure you have Poetry installed. Follow the steps given below:
```bash
git clone https://github.com/yasho191/SwiftAnnotate
cd SwiftAnnotate
poetry install
```
You're now ready to explore and develop SwiftAnnotate!
## Annotator-Validator Pipeline for LLMs and VLMs

The annotator-validator pipeline ensures high-quality annotations through a two-stage process:
**Stage 1: Annotation**
- Primary LLM/VLM generates initial annotations
- Configurable model selection (OpenAI, Google Gemini, Anthropic, Mistral, Qwen-VL)
**Stage 2: Validation**
- Secondary model validates initial annotations
- Cross-checks for hallucinations and factual accuracy
- Provides confidence scores and correction suggestions
- Option to regenerate annotations if validation fails
- Structured output format for consistency
**Benefits**
- Reduced hallucinations through 2 stage verification
- Higher annotation quality and consistency
- Automated quality control
- Traceable annotation process
The pipeline can be customized with different model combinations and validation thresholds based on specific use cases.
## Supported Modalities and Tasks
### Text
### Images
#### Captioning
Currently, we support OpenAI, Google-Gemini, Ollama, and Qwen2-VL for image captioning. As Qwen2-VL is not yet available on Ollama it is supported through HuggingFace. To get started quickly refer the code snippets shown below.
**OpenAI**
```python
import os
from swiftannotate.image import OpenAIForImageCaptioning
caption_model = "gpt-4o"
validation_model = "gpt-4o-mini"
api_key = ""
BASE_DIR = ""
image_paths = [os.path.join(BASE_DIR, image) for image in os.listdir(BASE_DIR)]
image_captioning_pipeline = OpenAIForImageCaptioning(
caption_model=caption_model,
validation_model=validation_model,
api_key=api_key,
output_file="image_captioning_output.json"
)
results = image_captioning_pipeline.generate(image_paths=image_paths)
```
**Qwen2-VL**
You can use any version for the Qwen2-VL (7B, 72B) depending on the available resources. vLLM inference is not currently supported but it will be available soon.
```python
import os
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers import BitsAndBytesConfig
from swiftannotate.image import Qwen2VLForImageCaptioning
# Load the images
BASE_DIR = ""
image_paths = [os.path.join(BASE_DIR, image) for image in os.listdir(BASE_DIR)]
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
device_map="auto",
torch_dtype="auto",
quantization_config=quantization_config)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
# Load the Caption Model
captioning_pipeline = Qwen2VLForImageCaptioning(
model = model,
processor = processor,
output_file="image_captioning_output.json"
)
results = captioning_pipeline.generate(image_paths)
```
### Videos
## Contributing to SwiftAnnotate 🤝
We welcome contributions to SwiftAnnotate! There are several ways you can help improve the project:
- **Enhanced Prompts**: Contribute better validation and annotation prompts for improved accuracy
- **File Support**: Add support for additional input/output file formats
- **Cloud Integration**: Implement AWS S3 storage support and other cloud services
- **Validation Strategies**: Develop new validation approaches for different annotation tasks
- **Model Support**: Integrate additional LLMs and VLMs
- **Documentation**: Improve guides and examples
Please submit a pull request with your contributions or open an issue to discuss new features.