An open API service indexing awesome lists of open source software.

https://github.com/replicate/cog-vila

Cog wrapper for VILA
https://github.com/replicate/cog-vila

Last synced: 9 months ago
JSON representation

Cog wrapper for VILA

Awesome Lists containing this project

README

          

## VILA
Cog wrapper for VILA, a visual language model (VLM) pretrained with interleaved image-text data. See the [paper](https://arxiv.org/abs/2312.07533), [official repo](https://github.com/Efficient-Large-Model/VILA) and Replicate [demos](https://replicate.com/adirik/vila-13b) for details.

## How to use the API

You need to have Cog and Docker installed to run this model locally. To build the docker container with cog and run a prediction:

```
cog predict -i image=@sample_images/1.jpg -i prompt="Can you describe this image?"
```

To start a server and send requests to your locally or remotely deployed API:
```
cog run -p 5000 python -m cog.server.http
```

To use VILA, provide an image and a text prompt. The response is generated by decoding the model's output using beam search with the specified parameters. The input arguments to the API are as follows:

- **image:** The image to discuss.
- **prompt:** The query to generate a response for.
- **top_p:** When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.
- **temperature:** When decoding text, higher values make the model more creative.
- **num_beams:** Number of beams to use when decoding text; higher values are slower but more accurate.
- **max_tokens:** Maximum number of tokens to generate.

## References
```
@misc{lin2023vila,
title={VILA: On Pre-training for Visual Language Models},
author={Ji Lin and Hongxu Yin and Wei Ping and Yao Lu and Pavlo Molchanov and Andrew Tao and Huizi Mao and Jan Kautz and Mohammad Shoeybi and Song Han},
year={2023},
eprint={2312.07533},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```