https://github.com/replicate/cog-vila

Cog wrapper for VILA
https://github.com/replicate/cog-vila

Last synced: 11 months ago
JSON representation

Cog wrapper for VILA

Host: GitHub
URL: https://github.com/replicate/cog-vila
Owner: replicate
License: apache-2.0
Created: 2024-03-13T13:55:09.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-03-13T14:05:04.000Z (about 2 years ago)
Last Synced: 2025-06-06T05:06:00.363Z (12 months ago)
Language: Python
Size: 3.2 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

## VILA
Cog wrapper for VILA, a visual language model (VLM) pretrained with interleaved image-text data. See the [paper](https://arxiv.org/abs/2312.07533), [official repo](https://github.com/Efficient-Large-Model/VILA) and Replicate [demos](https://replicate.com/adirik/vila-13b) for details.

## How to use the API

You need to have Cog and Docker installed to run this model locally. To build the docker container with cog and run a prediction:

```
cog predict -i image=@sample_images/1.jpg -i prompt="Can you describe this image?"
```

To start a server and send requests to your locally or remotely deployed API:
```
cog run -p 5000 python -m cog.server.http
```

To use VILA, provide an image and a text prompt. The response is generated by decoding the model's output using beam search with the specified parameters. The input arguments to the API are as follows:

- **image:** The image to discuss.
- **prompt:** The query to generate a response for.
- **top_p:** When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.
- **temperature:** When decoding text, higher values make the model more creative.
- **num_beams:** Number of beams to use when decoding text; higher values are slower but more accurate.
- **max_tokens:** Maximum number of tokens to generate.

## References
```
@misc{lin2023vila,
title={VILA: On Pre-training for Visual Language Models},
author={Ji Lin and Hongxu Yin and Wei Ping and Yao Lu and Pavlo Molchanov and Andrew Tao and Huizi Mao and Jan Kautz and Mohammad Shoeybi and Song Han},
year={2023},
eprint={2312.07533},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/replicate/cog-vila

Awesome Lists containing this project

README