An open API service indexing awesome lists of open source software.

https://github.com/makazhanalpamys/soup

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.
https://github.com/makazhanalpamys/soup

artificial-intelligence cli dpo fine-tuning finetuning gguf huggingface llm llmops local-llm lora machine-learning model-finetuning ollama peft python pytorch qlora sft transformers

Last synced: 11 days ago
JSON representation

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

Awesome Lists containing this project

README

          


Soup

Soup


Fine-tune LLMs in one command. No SSH, no config hell.


Website ·
Quick Start ·
Config ·
Docs ·
Commands ·
Models


PyPI
Downloads
Python 3.10+
Apache-2.0 License
Tests
CI
Website

---

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

```bash
pip install 'soup-cli[train]' # add [train] to fine-tune; bare `soup-cli` is the light CLI
soup init --template chat
soup train
```

## Why Soup?

Training LLMs is still painful. Even experienced teams spend 30-50% of their time fighting
infrastructure instead of improving models. Soup fixes that.

- **Zero SSH.** Never SSH into a broken GPU box again.
- **One config.** A simple YAML file is all you need.
- **Auto everything.** Batch size, GPU detection, quantization — handled.
- **Works locally.** Train on your own GPU with QLoRA. No cloud required.

## What's New

**v0.71.0 — Lighter install.** The heavy training stack (PyTorch, Transformers, PEFT, TRL,
datasets, bitsandbytes, accelerate) moved into a `[train]` extra. `pip install soup-cli` is now a
light CLI + data-tools install with no PyTorch; `pip install 'soup-cli[train]'` adds everything you
need to fine-tune. **Breaking:** existing users who train must reinstall with `[train]`.

Full history: [CHANGELOG.md](CHANGELOG.md) · [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases).

## Quick Start

### 1. Install

```bash
pip install soup-cli # light: CLI + config + data tools (no PyTorch)
pip install 'soup-cli[train]' # add the training stack (torch, transformers, peft, trl, …)
pip install git+https://github.com/MakazhanAlpamys/Soup.git # latest dev
```

`soup init`, `soup data …`, and the other data/inspection commands work on the light install.
Fine-tuning (`soup train`) needs the `[train]` extra.

### 2. Create a config

```bash
soup init # interactive wizard
soup init --template chat # or start from a template
```

Templates: `chat`, `code`, `tool-calling`, `medical`, `reasoning`, `vision`, `kto`, `orpo`,
`simpo`, `ipo`, `bco`, `rlhf`, `pretrain`, `moe`, `longcontext`, `embedding`, `audio`.

### 3. Train, test, ship

```bash
soup train --config soup.yaml # LoRA, quantization, batching — all handled
soup chat --model ./output # talk to your model
soup push --model ./output --repo you/my-model

soup merge --adapter ./output # merge LoRA into the base
soup export --model ./output --format gguf --quant q4_k_m # GGUF for Ollama / llama.cpp
```

More export targets (ONNX, TensorRT, AWQ, GPTQ, BitNet) and deployment options live in
[`docs/serving-and-export.md`](docs/serving-and-export.md).

## Configuration

A complete `soup.yaml`:

```yaml
base: meta-llama/Llama-3.1-8B-Instruct
task: sft
# backend: unsloth # 2-5x faster, pip install 'soup-cli[fast]'

data:
train: ./data/train.jsonl
format: alpaca
val_split: 0.1

training:
epochs: 3
lr: 2e-5
batch_size: auto
lora:
r: 64
alpha: 16
quantization: 4bit

output: ./output
```

`config/schema.py` is the single source of truth for every field. Advanced data, training,
and PEFT options are documented under [Documentation](#documentation).

## Documentation

The full feature reference lives in [`docs/`](docs/). Start here:

| Guide | Covers |
|---|---|
| [Training tasks & methods](docs/training.md) | SFT, DPO/GRPO/PPO/KTO/ORPO/SimPO/IPO/BCO, tool-calling, PRM, pre-training, distillation, classification, vision/audio/TTS, unlearning, RAFT/RA-DIT, loop-hardening detectors |
| [PEFT, long context & efficiency](docs/peft-and-efficiency.md) | DoRA, LoRA+, rsLoRA, VeRA, OLoRA, NEFTune, PiSSA, ReLoRA, optimizer & PEFT zoo, LLaMA Pro, GaLore, YaRN/LongLoRA, packing, curriculum, auto-tuning |
| [Performance & quantization](docs/performance-and-quantization.md) | QAT, FP8, Quant Menu (I + II), KV-cache, NVFP4, save formats, Cut Cross-Entropy, gradient checkpointing, kernels, activation offloading, multi-GPU / DeepSpeed / FSDP |
| [Data engineering](docs/data.md) | Formats, the Axolotl/LF-parity pipeline, data tools, synthetic generation & forge, quality scorecards, trace tooling, remote datasets, mixing, recipe DAGs |
| [Evaluation & probes](docs/evaluation.md) | Eval design/gate, eval-gated training, benchmarks, NLG metrics, calibration, Elo arena, diagnose, post-train X-ray probes, A/B, drift, tunability, `soup advise` |
| [Serving & export](docs/serving-and-export.md) | OpenAI-compatible server, batch inference, benchmarking, merge/export, Anthropic Messages endpoint, speculative decoding, deploy autopilot, Web UI, Agent Forge |
| [Adapters, registry & governance](docs/adapters-and-governance.md) | Adapter lifecycle/management, model registry, Soup Cans, the data flywheel (`soup loop`), knowledge editing, steering, supply-chain controls (scan/sign/BOM/attest/audit/airgap) |
| [Backends, platform & ops](docs/backends-and-ops.md) | MLX/Unsloth backends, alternative hubs, HF Hub integration, autopilot, experiment tracking, plan/apply, env lockfiles, hardware-fit, completions, plugins, utility commands |
| [Command reference](docs/commands.md) | The full `soup` command list |
| [Supported models & extras](docs/models.md) | Recommended model families, the VRAM size guide, the pip extras matrix |

## Data Formats

All formats are auto-detected from JSONL, JSON, CSV, Parquet, or TXT:

- **alpaca** — `{"instruction": ..., "input": ..., "output": ...}`
- **sharegpt** — `{"conversations": [{"from": "human", "value": ...}, ...]}`
- **chatml** — `{"messages": [{"role": "user", "content": ...}, ...]}`
- **dpo / orpo / simpo / ipo** — `{"prompt": ..., "chosen": ..., "rejected": ...}`
- **kto** — `{"prompt": ..., "completion": ..., "label": true}`
- **llava / sharegpt4v** (vision), **audio**, **plaintext** (pre-training), **embedding**,
**prm**, **pre_tokenized**, **video**, **multimodal**

Full schemas and the Axolotl/LlamaFactory-parity data pipeline (remote URIs, streaming,
sharding, interleaving, vocab expansion, document ingestion) are in
[`docs/data.md`](docs/data.md).

## Common Commands

```bash
soup train --config soup.yaml # train (SFT/DPO/GRPO/PPO/KTO/ORPO/SimPO/IPO/...)
soup infer --model ./output --input prompts.jsonl # batch inference
soup chat --model ./output # interactive chat
soup serve --model ./output # OpenAI-compatible API server
soup merge --adapter ./output # merge LoRA into the base model
soup export --model ./output --format gguf # export for deployment
soup eval benchmark --model ./output # evaluate
soup data inspect ./data/train.jsonl # dataset stats
soup recipes list # 100+ ready-made model recipes
soup autopilot --model --data d.jsonl --goal chat # zero-config
soup doctor # check GPU / deps / environment
```

The complete command list is in [`docs/commands.md`](docs/commands.md).

## Supported Models

Soup works with **any** text-generation model on the
[HuggingFace Hub](https://huggingface.co/models?pipeline_tag=text-generation) — if it loads with
`AutoModelForCausalLM`, it works, zero config changes. Llama 3.x/4, Qwen 2.5/3, Gemma 3, Mistral,
Mixtral, DeepSeek R1/V3, Phi-4, and 100+ others ship as ready-made recipes (`soup recipes list`).

| VRAM | Max model (QLoRA 4-bit) | Example |
|---|---|---|
| 8 GB | ~7B | Llama-3.1-8B, Mistral-7B |
| 16 GB | ~14B | Phi-4-14B, Qwen2.5-14B |
| 24 GB | ~34B | CodeLlama-34B, Yi-1.5-34B |
| 48 GB | ~70B | Llama-3.3-70B |
| 80 GB+ | 70B+ (full) or MoE | Mixtral-8x22B, DeepSeek-V3 |

Full model + vision tables and the optional-extras matrix are in [`docs/models.md`](docs/models.md).

## Docker

Run Soup without installing CUDA or PyTorch locally (image published to GHCR on every release):

```bash
docker pull ghcr.io/makazhanalpamys/soup:latest
docker run --gpus all -v $(pwd):/workspace ghcr.io/makazhanalpamys/soup train --config soup.yaml
docker compose up # or build locally
```

## Requirements

- Python 3.10+
- GPU with CUDA (recommended), Apple Silicon (MPS), or CPU (experimental — very slow)
- 8 GB+ VRAM for 7B models with QLoRA

All training tasks run on CPU for testing (quantization auto-disabled). Optional extras
(`train`, `all`, `fast`, `vision`, `qat`, `serve`, `serve-fast`, `ui`, `eval`, `deepspeed`,
`liger`, `mlx`, `onnx`, `tensorrt`, …) are listed in
[`docs/models.md`](docs/models.md#optional-extras).

## Troubleshooting

```bash
soup doctor # GPU, system resources, dependencies, and version in one place
```

- **`ImportError: DLL load failed while importing _C` (Windows)** — reinstall PyTorch for your
CUDA version: `pip install torch --index-url https://download.pytorch.org/whl/cu121`.
- **`soup version` ≠ `pip show soup-cli`** — multiple Python installs; use a virtualenv.

## Development

```bash
git clone https://github.com/MakazhanAlpamys/Soup.git
cd Soup
pip install -e ".[dev]"

ruff check src/soup_cli/ tests/ # lint
pytest tests/ -v # unit tests (fast, no GPU)
pytest tests/ -m smoke -v # smoke tests (downloads a tiny model, trains)

pre-commit install # optional: ruff lint+format on commit
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for the full workflow and [SECURITY.md](SECURITY.md) to
report a vulnerability.

## License

[Apache-2.0](LICENSE). Copyright © the Soup contributors.