https://github.com/deeflect/vasted
GPU inference in one command. Auto-picks a cheap Vast.ai GPU, loads any GGUF model, gives you an OpenAI-compatible endpoint.
https://github.com/deeflect/vasted
gguf gpu inference llm openai python vast-ai
Last synced: 2 days ago
JSON representation
GPU inference in one command. Auto-picks a cheap Vast.ai GPU, loads any GGUF model, gives you an OpenAI-compatible endpoint.
- Host: GitHub
- URL: https://github.com/deeflect/vasted
- Owner: deeflect
- License: mit
- Created: 2026-03-01T22:35:25.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-04T22:50:55.000Z (2 months ago)
- Last Synced: 2026-04-05T00:40:22.442Z (2 months ago)
- Topics: gguf, gpu, inference, llm, openai, python, vast-ai
- Language: Python
- Homepage: https://github.com/deeflect/vasted
- Size: 20.1 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# vasted
[](https://github.com/deeflect/vasted/actions/workflows/ci.yml)
[](./LICENSE)
`vasted` is a CLI that launches on-demand Vast.ai GPU workers for `llama.cpp` GGUF inference and exposes a stable OpenAI-compatible `/v1` endpoint.
## Demo

## Why `vasted`
- Stable client endpoint while worker URLs rotate.
- Setup wizard for local machine and VPS deployments.
- Non-interactive automation mode for agents/CI.
- OpenAI-compatible proxy for tools that expect `/v1` APIs.
- Session usage and cost tracking.
- Optional Telegram bot control commands.
## Requirements
- Python `3.12+`
- [`uv`](https://docs.astral.sh/uv/)
- Vast.ai account + API key
- Optional: Telegram bot token (`telegram` extra)
## Install
### From PyPI (recommended)
```bash
uv tool install vasted
vasted --version
```
Upgrade:
```bash
uv tool upgrade vasted
```
### From source (development)
```bash
git clone https://github.com/deeflect/vasted.git
cd vasted
uv sync --extra dev
```
Run CLI commands from the repo:
```bash
uv run vasted --help
```
### Git install (latest main)
```bash
uv tool install "git+https://github.com/deeflect/vasted.git"
```
## Quick Start
If installed as a tool:
```bash
vasted setup
vasted up
vasted status --verbose
```
From source checkout:
```bash
uv run vasted setup
uv run vasted up
uv run vasted status --verbose
```
Client connection values after setup:
- Base URL: `http://:/v1`
- Auth header: `Authorization: Bearer `
When `proxy_host` is `0.0.0.0`, use your real machine/VPS IP or domain in clients.
## Automation / Unattended Mode
Use non-interactive commands to avoid prompts:
```bash
uv run vasted setup --non-interactive \
--vast-api-key "$VASTED_API_KEY" \
--bearer-token "$VASTED_BEARER_TOKEN" \
--client openclaw \
--deployment-mode local_pc \
--model qwen3-coder-30b \
--quality balanced \
--gpu-mode auto
uv run vasted up --non-interactive --yes --jinja --model qwen3-coder-30b --quality balanced --gpu-mode auto --no-serve
uv run vasted status --verbose
uv run vasted usage
uv run vasted down --force
```
Environment variables accepted by `setup --non-interactive`:
- `VASTED_API_KEY`
- `VASTED_BEARER_TOKEN`
- `VASTED_CLIENT` (`openclaw`, `opencode`, `custom`)
- `VASTED_LLAMA_JINJA` (`true`/`false`)
- `VASTED_MODEL`, `VASTED_QUALITY`, `VASTED_GPU_MODE`, `VASTED_GPU_PRESET`
- `VASTED_DEPLOYMENT_MODE`, `VASTED_PROXY_HOST`, `VASTED_PROXY_PORT`, `VASTED_PUBLIC_HOST`
## Client Profiles and Jinja Behavior
`setup` supports client presets that define default `llama.cpp --jinja` behavior:
- `--client openclaw`: jinja on by default
- `--client opencode`: jinja off by default
- `--client custom`: keep/manual behavior
Per launch override is still available:
```bash
uv run vasted up --jinja
uv run vasted up --no-jinja
```
## Command Reference
```bash
vasted setup [--non-interactive] [--manual] [--client openclaw|opencode|custom]
vasted up [--model ...] [--quality ...] [--gpu-mode auto|manual] [--gpu-preset ...] [--profile ...] [--max-price ...] [--jinja|--no-jinja] [--yes] [--non-interactive] [--serve|--no-serve]
vasted down [--force]
vasted status [--verbose]
vasted logs [--instance-id N] [--tail N]
vasted usage
vasted token show [--full]
vasted token rotate
vasted rotate-token
vasted config show
vasted profile list|add|use|remove
vasted completions
```
## Telegram Bot (Optional)
Install telegram extra and run:
```bash
uv sync --extra telegram
uv run python bot.py
```
## Development
```bash
uv run ruff check .
uv run mypy app tests bot.py
uv run pytest -q
```
## Project Layout
- `app/commands/*`: CLI command handlers
- `app/service.py`: worker lifecycle + launch policy
- `app/proxy.py`: OpenAI-compatible reverse proxy
- `app/vast.py`: Vast API integration + startup script generation
- `app/usage.py`: token/time/cost accounting
- `app/user_config.py`: persistent config + keyring integration
- `app/state.py`: runtime state persistence
- `bot.py`: optional Telegram control plane
## Security
- Keep Vast API keys and bearer tokens private.
- Prefer localhost binds unless remote access is required.
- See [SECURITY.md](./SECURITY.md) for disclosure policy.
## Contributing
See [CONTRIBUTING.md](./CONTRIBUTING.md) and run the validation commands before opening a PR.
## License
MIT — see [LICENSE](./LICENSE).
---
### Built by
Built by [Dee](https://deeflect.com). Started as "just spin up a GPU for an hour" and grew a control plane, a Telegram bot, and an OpenAI-compatible proxy. Such is life.
Star if vasted spared you the AWS GPU pricing calculator. [Open an issue](https://github.com/deeflect/vasted/issues) if Vast.ai changes its API again (they will).
[deeflect.com](https://deeflect.com) · [Wikidata](https://www.wikidata.org/entity/Q138828544) · [LinkedIn](https://www.linkedin.com/in/dkargaev/) · [X](https://x.com/deeflectcom)