https://github.com/nvidia-ai-iot/nanoowl
A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
https://github.com/nvidia-ai-iot/nanoowl
detect fast inference jetson jetson-agx-orin jetson-orin-nano nvidia real-time tensorrt transformers tree zero-shot
Last synced: 3 months ago
JSON representation
A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
- Host: GitHub
- URL: https://github.com/nvidia-ai-iot/nanoowl
- Owner: NVIDIA-AI-IOT
- License: apache-2.0
- Created: 2023-09-12T02:57:34.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-06T01:01:22.000Z (8 months ago)
- Last Synced: 2025-02-06T02:24:34.136Z (8 months ago)
- Topics: detect, fast, inference, jetson, jetson-agx-orin, jetson-orin-nano, nvidia, real-time, tensorrt, transformers, tree, zero-shot
- Language: Python
- Homepage:
- Size: 21.4 MB
- Stars: 293
- Watchers: 6
- Forks: 50
- Open Issues: 29
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
NanoOWL
👍 Usage - ⏱️ Performance - 🛠️ Setup - 🤸 Examples
- 👏 Acknowledgment - 🔗 See alsoNanoOWL is a project that optimizes [OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) to run 🔥 ***real-time*** 🔥 on [NVIDIA Jetson Orin Platforms](https://store.nvidia.com/en-us/jetson/store) with [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). NanoOWL also introduces a new "tree detection" pipeline that combines OWL-ViT and CLIP to enable nested detection and classification of anything, at any level, simply by providing text.
> Interested in detecting object masks as well? Try combining NanoOWL with
> [NanoSAM](https://github.com/NVIDIA-AI-IOT/nanosam) for zero-shot open-vocabulary
> instance segmentation.You can use NanoOWL in Python like this
```python3
from nanoowl.owl_predictor import OwlPredictorpredictor = OwlPredictor(
"google/owlvit-base-patch32",
image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)image = PIL.Image.open("assets/owl_glove_small.jpg")
output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.1)
print(output)
```Or better yet, to use OWL-ViT in conjunction with CLIP to detect and classify anything,
at any level, check out the tree predictor example below!> See [Setup](#setup) for instructions on how to build the image encoder engine.
NanoOWL runs real-time on Jetson Orin Nano.
Model †
Image Size
Patch Size
⏱️ Jetson Orin Nano (FPS)
⏱️ Jetson AGX Orin (FPS)
🎯 Accuracy (mAP)
OWL-ViT (ViT-B/32)
768
32
TBD
95
28
OWL-ViT (ViT-B/16)
768
16
TBD
25
31.7
1. Install the dependencies
1. Install PyTorch
2. Install [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt)
3. Install NVIDIA TensorRT
4. Install the Transformers library```bash
python3 -m pip install transformers
```
5. (optional) Install NanoSAM (for the instance segmentation example)2. Install the NanoOWL package.
```bash
git clone https://github.com/NVIDIA-AI-IOT/nanoowl
cd nanoowl
python3 setup.py develop --user
```3. Build the TensorRT engine for the OWL-ViT vision encoder
```bash
mkdir -p data
python3 -m nanoowl.build_image_encoder_engine \
data/owl_image_encoder_patch32.engine
```
4. Run an example prediction to ensure everything is working
```bash
cd examples
python3 owl_predict.py \
--prompt="[an owl, a glove]" \
--threshold=0.1 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```That's it! If everything is working properly, you should see a visualization saved to ``data/owl_predict_out.jpg``.
### Example 1 - Basic prediction
This example demonstrates how to use the TensorRT optimized OWL-ViT model to
detect objects by providing text descriptions of the object labels.To run the example, first navigate to the examples folder
```bash
cd examples
```Then run the example
```bash
python3 owl_predict.py \
--prompt="[an owl, a glove]" \
--threshold=0.1 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```By default the output will be saved to ``data/owl_predict_out.jpg``.
You can also use this example to profile inference. Simply set the flag ``--profile``.
### Example 2 - Tree prediction
This example demonstrates how to use the tree predictor class to detect and
classify objects at any level.To run the example, first navigate to the examples folder
```bash
cd examples
```To detect all owls, and the detect all wings and eyes in each detect owl region
of interest, type```bash
python3 tree_predict.py \
--prompt="[an owl [a wing, an eye]]" \
--threshold=0.15 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```By default the output will be saved to ``data/tree_predict_out.jpg``.
To classify the image as indoors or outdoors, type
```bash
python3 tree_predict.py \
--prompt="(indoors, outdoors)" \
--threshold=0.15 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```To classify the image as indoors or outdoors, and if it's outdoors then detect
all owls, type```bash
python3 tree_predict.py \
--prompt="(indoors, outdoors [an owl])" \
--threshold=0.15 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```### Example 3 - Tree prediction (Live Camera)
This example demonstrates the tree predictor running on a live camera feed with
live-edited text prompts. To run the example1. Ensure you have a camera device connected
2. Launch the demo
```bash
cd examples/tree_demo
python3 tree_demo.py ../../data/owl_image_encoder_patch32.engine
```
3. Second, open your browser to ``http://:7860``
4. Type whatever prompt you like to see what works! Here are some examples
- Example: [a face [a nose, an eye, a mouth]]
- Example: [a face (interested, yawning / bored)]
- Example: (indoors, outdoors)Thanks to the authors of [OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) for the great open-vocabluary detection work.
- [NanoSAM](https://github.com/NVIDIA-AI-IOT/nanosam) - A real-time Segment Anything (SAM) model variant for NVIDIA Jetson Orin platforms.
- [Jetson Introduction to Knowledge Distillation Tutorial](https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation) - For an introduction to knowledge distillation as a model optimization technique.
- [Jetson Generative AI Playground](https://nvidia-ai-iot.github.io/jetson-generative-ai-playground/) - For instructions and tips for using a variety of LLMs and transformers on Jetson.
- [Jetson Containers](https://github.com/dusty-nv/jetson-containers) - For a variety of easily deployable and modular Jetson Containers