https://github.com/nvidia-ai-iot/nanoowl

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
https://github.com/nvidia-ai-iot/nanoowl

detect fast inference jetson jetson-agx-orin jetson-orin-nano nvidia real-time tensorrt transformers tree zero-shot

Last synced: 3 months ago
JSON representation

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.

Host: GitHub
URL: https://github.com/nvidia-ai-iot/nanoowl
Owner: NVIDIA-AI-IOT
License: apache-2.0
Created: 2023-09-12T02:57:34.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-02-06T01:01:22.000Z (8 months ago)
Last Synced: 2025-02-06T02:24:34.136Z (8 months ago)
Topics: detect, fast, inference, jetson, jetson-agx-orin, jetson-orin-nano, nvidia, real-time, tensorrt, transformers, tree, zero-shot
Language: Python
Homepage:
Size: 21.4 MB
Stars: 293
Watchers: 6
Forks: 50
Open Issues: 29
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

NanoOWL

👍 Usage - ⏱️ Performance - 🛠️ Setup - 🤸 Examples
- 👏 Acknowledgment - 🔗 See also

NanoOWL is a project that optimizes [OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) to run 🔥 ***real-time*** 🔥 on [NVIDIA Jetson Orin Platforms](https://store.nvidia.com/en-us/jetson/store) with [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). NanoOWL also introduces a new "tree detection" pipeline that combines OWL-ViT and CLIP to enable nested detection and classification of anything, at any level, simply by providing text.

> Interested in detecting object masks as well? Try combining NanoOWL with
> [NanoSAM](https://github.com/NVIDIA-AI-IOT/nanosam) for zero-shot open-vocabulary
> instance segmentation.

## 👍 Usage

You can use NanoOWL in Python like this

```python3
from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
"google/owlvit-base-patch32",
image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.1)

print(output)
```

Or better yet, to use OWL-ViT in conjunction with CLIP to detect and classify anything,
at any level, check out the tree predictor example below!

> See [Setup](#setup) for instructions on how to build the image encoder engine.

## ⏱️ Performance

NanoOWL runs real-time on Jetson Orin Nano.

Model †
Image Size
Patch Size
⏱️ Jetson Orin Nano (FPS)
⏱️ Jetson AGX Orin (FPS)
🎯 Accuracy (mAP)

OWL-ViT (ViT-B/32)
768
32
TBD
95
28

OWL-ViT (ViT-B/16)
768
16
TBD
25
31.7

## 🛠️ Setup

1. Install the dependencies

1. Install PyTorch

2. Install [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt)
3. Install NVIDIA TensorRT
4. Install the Transformers library

```bash
python3 -m pip install transformers
```
5. (optional) Install NanoSAM (for the instance segmentation example)

2. Install the NanoOWL package.

```bash
git clone https://github.com/NVIDIA-AI-IOT/nanoowl
cd nanoowl
python3 setup.py develop --user
```

3. Build the TensorRT engine for the OWL-ViT vision encoder

```bash
mkdir -p data
python3 -m nanoowl.build_image_encoder_engine \
data/owl_image_encoder_patch32.engine
```

4. Run an example prediction to ensure everything is working

```bash
cd examples
python3 owl_predict.py \
--prompt="[an owl, a glove]" \
--threshold=0.1 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```

That's it! If everything is working properly, you should see a visualization saved to ``data/owl_predict_out.jpg``.

## 🤸 Examples

### Example 1 - Basic prediction

This example demonstrates how to use the TensorRT optimized OWL-ViT model to
detect objects by providing text descriptions of the object labels.

To run the example, first navigate to the examples folder

```bash
cd examples
```

Then run the example

```bash
python3 owl_predict.py \
--prompt="[an owl, a glove]" \
--threshold=0.1 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```

By default the output will be saved to ``data/owl_predict_out.jpg``.

You can also use this example to profile inference. Simply set the flag ``--profile``.

### Example 2 - Tree prediction

This example demonstrates how to use the tree predictor class to detect and
classify objects at any level.

To run the example, first navigate to the examples folder

```bash
cd examples
```

To detect all owls, and the detect all wings and eyes in each detect owl region
of interest, type

```bash
python3 tree_predict.py \
--prompt="[an owl [a wing, an eye]]" \
--threshold=0.15 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```

By default the output will be saved to ``data/tree_predict_out.jpg``.

To classify the image as indoors or outdoors, type

```bash
python3 tree_predict.py \
--prompt="(indoors, outdoors)" \
--threshold=0.15 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```

To classify the image as indoors or outdoors, and if it's outdoors then detect
all owls, type

```bash
python3 tree_predict.py \
--prompt="(indoors, outdoors [an owl])" \
--threshold=0.15 \
--image_encoder_engine=../data/owl_image_encoder_patch32.engine
```

### Example 3 - Tree prediction (Live Camera)

This example demonstrates the tree predictor running on a live camera feed with
live-edited text prompts. To run the example

1. Ensure you have a camera device connected

2. Launch the demo
```bash
cd examples/tree_demo
python3 tree_demo.py ../../data/owl_image_encoder_patch32.engine
```
3. Second, open your browser to ``http://:7860``
4. Type whatever prompt you like to see what works! Here are some examples
- Example: [a face [a nose, an eye, a mouth]]
- Example: [a face (interested, yawning / bored)]
- Example: (indoors, outdoors)

## 👏 Acknowledgement

Thanks to the authors of [OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) for the great open-vocabluary detection work.

## 🔗 See also

- [NanoSAM](https://github.com/NVIDIA-AI-IOT/nanosam) - A real-time Segment Anything (SAM) model variant for NVIDIA Jetson Orin platforms.
- [Jetson Introduction to Knowledge Distillation Tutorial](https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation) - For an introduction to knowledge distillation as a model optimization technique.
- [Jetson Generative AI Playground](https://nvidia-ai-iot.github.io/jetson-generative-ai-playground/) - For instructions and tips for using a variety of LLMs and transformers on Jetson.
- [Jetson Containers](https://github.com/dusty-nv/jetson-containers) - For a variety of easily deployable and modular Jetson Containers

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nvidia-ai-iot/nanoowl

Awesome Lists containing this project

README

NanoOWL