https://github.com/ayushexel/trolo
An SDK for Transformers + YOLO and other SSD family models
https://github.com/ayushexel/trolo
computer-vision object-detection opencv segmentation transformers yolo
Last synced: 3 months ago
JSON representation
An SDK for Transformers + YOLO and other SSD family models
- Host: GitHub
- URL: https://github.com/ayushexel/trolo
- Owner: AyushExel
- License: apache-2.0
- Created: 2024-11-19T21:34:57.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-01-24T20:25:18.000Z (3 months ago)
- Last Synced: 2025-01-24T21:24:15.840Z (3 months ago)
- Topics: computer-vision, object-detection, opencv, segmentation, transformers, yolo
- Language: Jupyter Notebook
- Homepage: https://ayushexel.github.io/trolo/intro
- Size: 14.3 MB
- Stars: 56
- Watchers: 3
- Forks: 11
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Trolo
A framework for harnessing the power of transformers with YOLO models and other single-shot detectors!
> **Note**: This is an early release. The package is under active development. Please report any issues and I'll try to fix them ASAP.
Quickstart on Colab: [](https://colab.research.google.com/github/ayushexel/trolo/blob/master/recipes/quickstart.ipynb)
🔥 **NEW** 🔥 D-Fine models are now available. Inspired by RT-DETR outperform all real-time detectors including YOLO-series models
## Installation
```bash
pip install trolo
```## Features
- 🔥 Transformer-enhanced object detection
- 🎯 Single-shot detection capabilities
- ⚡ High performance inference
- 🛠️ Easy to use CLI interface
- 🚀 Fast video stream inference
- 🧠 Automatic DDP handling## Available Models
D-FINE
The D-FINE model redefines regression tasks in DETR-based detectors using Fine-grained Distribution Refinement (FDR).
[Official Paper](https://arxiv.org/abs/2410.13842) | [Official Repo](https://github.com/Peterande/D-FINE)
( All models will be automatically downloaded when you pass the name for any task)
| Model | Dataset | APval | #Params | Latency | GFLOPs |
| :---: | :---: | :---: | :---: | :---: | :---: |
`dfine-n` | COCO | **42.8** | 4M | 2.12ms | 7
`dfine-s` | COCO | **48.5** | 10M | 3.49ms | 25
`dfine-m` | COCO | **52.3** | 19M | 5.62ms | 57
`dfine-l` | COCO | **54.0** | 31M | 8.07ms | 91
`dfine-x` | COCO | **55.8** | 62M | 12.89ms | 202RT-DETR v3 (Coming Soon)
RT-DETR v2 (Coming Soon)
Trolo-2024 (WIP)
## Quick Start
The CLI command structure is:
```bash
trolo [command] [options]
```
For detailed help:
```bash
trolo --help # for general help
trolo [command] --help # for command-specific help
```### Inference
Example inference command:
```bash
trolo predict --model dfine-n # automatically downloads model from trolo model hub
```
Support for single image, image folder, and video input
```bash
trolo predict --model dfine-n.pth --input img.jpg # folder/ or video.mp4 or 0 (for webcam)
```🔥 Smart Video stream inference - infers on videos in streaming mode so you never have to worry about memory issues!
Python API:
```python
from trolo.inference import DetectionPredictorpredictor = DetectionPredictor(model="dfine-n")
predictions = predictor.predict() # get predictions
poltted_preds = predictor.visualize(show=True, save=True) # or get visualized outputs
```
Visit Inference Docs for more details
### Model export
Example export command:
```bash
trolo export --model dfine-n --export_format onnx --input_size 640
```
Python API:
```
python from trolo.inference import ModelExporter model_path = "/path/to/model"
input_size = 640 # Inference resolution
export_format = "onnx"
exporter = ModelExporter(model=model_path) exporter.export(input_size=input_size, export_format=export_format)
```
Visit Export Docs for more details.
Please check [deployment](https://github.com/AyushExel/trolo/tree/master/recipes/deployment) for inference script for various deployment.### Training
Example training command:
```bash
trolo train --config dfine_n # automatically find the config file
```🔥 Automatically handle DDP by simply passing the GPUs to the CLI
```bash
trolo train --config dfine_n --device 0,1,2,3
```
That's it!Python API
```python
from trolo.trainers import DetectionTrainertrainer = DetectionTrainer(config="dfine_n") # or pass custom config path
trainer.train() # pass device = 0,1,2,3 to automatically handle DDP
```Visit Training Docs for more details
## Docker 🐳 Usage Guide
Build and Run Options
### Build Commands
```bash
# Standard build
docker build -t trolo .# Build with a specific tag
docker build -t trolo:v1 .# Build with build arguments (if needed)
docker build --build-arg SOME_ARG=value -t trolo .
```### Run Commands
#### 1. Basic Run (No GPU, No Volume)
```bash
docker run -it --name containaer_name trolo
```#### 2. Run with GPU Mounting
```bash
docker run -it --gpus --name containaer_name all trolo
```#### 3. Run with Volume Mounting
```bash
docker run -it -v /local/path/on/host:/workspace/app/trolo --name containaer_name trolo
```#### 4. Comprehensive Run (GPU and Volume)
```bash
docker run -it --gpus all -v /local/path/on/host:/workspace/app/trolo --name containaer_name trolo
```#### 5. Additional Run Options
```bash
# Run with custom entrypoint
docker run -it --entrypoint /bin/bash trolo# Run with environment variables
docker run -it -e CUSTOM_ENV=value trolo# Run in detached mode
docker run -d trolo
```#### Notes:
- Replace `/local/path/on/host` with your actual host path
- `--gpus all` requires NVIDIA Container Toolkit
- Volume mounting allows persistent data and code modifications## Totally open source and free
TLDR: This is a non-profit project.Use it, modify it, copy it, do whatever you want with it. And if something doesn't allow you to do that, please open an issue.
More details
* Apache 2.0
* The license has simply been copied from official apache repo. Please open an issue if something doesn't allow you to use it.
* This project is built on top of open licensed projects as mentioned below.
* I intend to keep this project free and open source FOREVER. There are no plans for direct/indirect monetization of this project. I only accept sponsorships for compute resources to train models and perform independent research.## Credits
This project builds upon several excellent open source projects:
- [D-FINE](https://github.com/Peterande/D-FINE): Original D-FINE model implementation
- [RT-DETR](https://github.com/lyuwenyu/RT-DETR): Real-time DETR architecture
- [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection): Detection frameworkMore details
- The original trainer is based on D-fine with major modifications for handling pre-trained weights, DDP, and other features.
- The architecture is for D-fine is same as the original paper and repo.## Contributing
Contributions are most welcome! Please feel free to submit a Pull Request.
---
**Note**: This is an early work in progress. Many features are still under development.
### Immidiate TODOs
- [ ] Docusaurus documentation