https://github.com/replicate/cog-owlvit
Cog wrapper for OWL-ViT
https://github.com/replicate/cog-owlvit
Last synced: 11 months ago
JSON representation
Cog wrapper for OWL-ViT
- Host: GitHub
- URL: https://github.com/replicate/cog-owlvit
- Owner: replicate
- License: apache-2.0
- Created: 2023-10-25T12:54:07.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-25T13:04:10.000Z (over 2 years ago)
- Last Synced: 2025-02-25T18:15:46.102Z (11 months ago)
- Language: Python
- Size: 1010 KB
- Stars: 0
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Cog-OWL-ViT
This is an implementation of Google's [OWL-ViT (v1)]([https://github.com/facebookresearch/nougat](https://github.com/google-research/scenic/tree/main/scenic/projects/owl_vit)) as a [Cog](https://github.com/replicate/cog) model. OWL-ViT uses a CLIP backbone to perform text-guided and open-vocabulary object detection. To use the model, simply input the image you'd like to query and enter the objects you would like to query as comma-separated text. For more details, see this [Replicate model](https://replicate.com/alaradirik/owlvit-base-patch32).
## Development
Follow the [model pushing guide](https://replicate.com/docs/guides/push-a-model) to push your own fork of OWL-ViT to [Replicate](https://replicate.com).
## Basic Usage
To run a prediction:
```bash
cog predict -i image=@data/astronaut.png -i query="human face, rocket, star-spangled banner, nasa badge"
```
To build the cog image and launch the API on your local:
```bash
cog run -p 5000 python -m cog.server.http
```
## References
```
@article{minderer2022simple,
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
journal={ECCV},
year={2022},
}
```