{"id":51030687,"url":"https://github.com/chrisgleissner/llm-tools","last_synced_at":"2026-07-02T02:03:48.811Z","repository":{"id":362080178,"uuid":"1257164727","full_name":"chrisgleissner/llm-tools","owner":"chrisgleissner","description":"Linux tools for local LLM use across Codex, Claude Code, and Copilot","archived":false,"fork":false,"pushed_at":"2026-06-12T09:17:24.000Z","size":387,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-12T09:25:02.184Z","etag":null,"topics":["ai","automation","bash","claude-code","cli","codex","codex-cli","copilot","copilot-cli","cost-tracking","developer-tools","linux","llm","llm-tools","ralph-loop","ralph-wiggum","scheduler","toolbox","usage-tracking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chrisgleissner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-02T12:27:28.000Z","updated_at":"2026-06-12T09:17:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/chrisgleissner/llm-tools","commit_stats":null,"previous_names":["chrisgleissner/llm-usage","chrisgleissner/llm-tools"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/chrisgleissner/llm-tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisgleissner%2Fllm-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisgleissner%2Fllm-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisgleissner%2Fllm-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisgleissner%2Fllm-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chrisgleissner","download_url":"https://codeload.github.com/chrisgleissner/llm-tools/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisgleissner%2Fllm-tools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34629658,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","automation","bash","claude-code","cli","codex","codex-cli","copilot","copilot-cli","cost-tracking","developer-tools","linux","llm","llm-tools","ralph-loop","ralph-wiggum","scheduler","toolbox","usage-tracking"],"created_at":"2026-06-22T00:01:46.085Z","updated_at":"2026-07-02T02:03:48.793Z","avatar_url":"https://github.com/chrisgleissner.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llm-tools\n\n\u003cimg src=\"./docs/img/logo.png\" alt=\"LLM Tools\"/\u003e\n\n[![Build](https://github.com/chrisgleissner/llm-tools/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/chrisgleissner/llm-tools/actions/workflows/test.yml)\n[![codecov](https://codecov.io/gh/chrisgleissner/llm-tools/graph/badge.svg)](https://codecov.io/gh/chrisgleissner/llm-tools)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)\n[![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20macOS-blue)](https://github.com/chrisgleissner/llm-tools/releases)\n\n`llm-tools` is a small set of command-line tools for staying on top of local LLM provider capacity: session windows, weekly limits, quotas, credit balances, cost budgets, and provider availability.\n\nThe goal is to make LLM CLI work **more observable and less wasteful**. You can see which providers still have capacity, avoid burning weekly limits blindly, and dispatch tasks as soon as a provider becomes usable again instead of leaving open session windows idle.\n\nThe tools are intentionally **local- and CLI-first**. Instead of introducing another authentication layer, they use the provider CLIs you already have installed and authenticated. Credentials stay with those tools, and normal use remains zero-config.\n\nSupported providers include: **Codex, Claude Code, GitHub Copilot, Kilo Code, MiniMax, OpenCode, and z.AI**.\n\n## Tools at a Glance\n\n| Command         | Use it when you want to...                                                                       |\n| --------------- | ------------------------------------------------------------------------------------------------ |\n| `llm-usage`      | Check remaining LLM capacity before starting work.                                               |\n| `llm-scheduler`  | Run one prompt through one selected provider once that provider has usable capacity.             |\n| `ralph-robin`    | Keep autonomous work moving by rotating across providers instead of stopping at the first limit. |\n| `llm-sleep-soak` | Prove suspend/resume is reliable on this machine before trusting unattended overnight runs.      |\n\n\u003cimg src=\"./docs/img/llm-usage5.png\" alt=\"LLM Usage\"/\u003e\n\n## Install\n\nInstall with [pipx](https://pipx.pypa.io):\n\n```bash\npipx install git+https://github.com/chrisgleissner/llm-tools.git\n```\n\nThis puts the commands on your `PATH` and keeps the package in its own virtual environment. It also works on externally managed Python installations such as Debian, Ubuntu, and Homebrew Python.\n\nIf you do not have `pipx` yet:\n\n```bash\npython3 -m pip install --user pipx\npython3 -m pipx ensurepath\n```\n\nOn macOS, you can also use Homebrew:\n\n```bash\nbrew install pipx\n```\n\nOpen a new shell, then verify the commands are available:\n\n```bash\ncommand -v llm-usage\ncommand -v llm-scheduler\ncommand -v ralph-robin\n```\n\n### Install from a Release Wheel\n\nEach [release](https://github.com/chrisgleissner/llm-tools/releases) ships a wheel ZIP archive. You can install the wheel directly:\n\n```bash\npipx install https://github.com/chrisgleissner/llm-tools/releases/download/0.3.2/llm_tools-0.3.2-py3-none-any.whl\n```\n\n### Install from a Local Checkout\n\nFrom a cloned repository:\n\n```bash\npipx install .\n```\n\nOr install into a virtual environment:\n\n```bash\npython3 -m venv .venv\n. .venv/bin/activate\npython -m pip install .\n```\n\nYou can also run the tools directly from a checkout:\n\n```bash\n./llm-usage\n./llm-scheduler\n./ralph-robin\n```\n\n## Quick Start\n\nCheck current capacity:\n\n```bash\nllm-usage\nllm-usage --watch 60\n```\n\nKeep a continuous low-overhead sampler running for instant client reports and\nburn-down history:\n\n```bash\nllm-usage --service-install\nllm-usage --service-status\nllm-usage --service-stop\n```\n\nRun a prompt once a specific provider is ready:\n\n```bash\nllm-scheduler --provider codex --prompt-file task.md\nllm-scheduler --provider kilo --prompt-file task.md\nllm-scheduler --provider minimax --prompt-file task.md\n```\n\nKeep work moving across providers:\n\n```bash\nralph-robin --prompt-file task.md\n```\n\nFollow the latest scheduler run:\n\n```bash\ntail -f ~/.cache/llm-tools/llm-scheduler/logs/latest/run.log\ntail -f ~/.cache/llm-tools/llm-scheduler/logs/latest/attempt-1.out\n```\n\n## Provider CLI Requirements\n\n`llm-tools` drives the official command-line clients for each provider. Install and authenticate the CLI for each provider you want to use.\n\n| Provider       | CLI binary | Install                                                                                                                   |\n| -------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------- |\n| Claude Code    | `claude`   | [claude.com/product/claude-code](https://www.claude.com/product/claude-code) - `npm install -g @anthropic-ai/claude-code` |\n| Codex          | `codex`    | [github.com/openai/codex](https://github.com/openai/codex) - `npm install -g @openai/codex`                               |\n| GitHub Copilot | `copilot`  | [github.com/github/copilot-cli](https://github.com/github/copilot-cli) - `npm install -g @github/copilot`                 |\n| Kilo Code      | `kilo`     | [kilo.ai](https://kilo.ai) - `npm install -g @kilocode/cli`                                                               |\n| MiniMax        | `mmx`      | [platform.minimax.io](https://platform.minimax.io/) - `npm install -g mmx-cli`                                            |\n| OpenCode       | `opencode` | [opencode.ai](https://opencode.ai/) - `npm install -g opencode-ai`                                                        |\n| z.AI (capacity)| _none_     | Capacity-only: zero-config — the key is read from Kilo's/OpenCode's `auth.json`; launch via Kilo (`zai/\u003cmodel\u003e`) — see [z.AI](#zai-glm-via-kilo-or-opencode). |\n\nYou do not need every provider CLI installed.\n\n* `llm-usage` reports unavailable providers as `unavailable` and still shows the rest.\n* `llm-scheduler` only needs the provider selected with `--provider`.\n* `ralph-robin` skips unavailable providers and rotates across the usable ones. Its default rotation is `claude,codex,opencode`; use `--providers` to change it.\n\n## Capacity Scopes\n\n`llm-tools` calls every quota-like constraint a **scope**.\n\nA scope is one capacity measure exposed by one provider. For example, Codex and Claude can expose `5h` and `weekly` reset windows, while Kilo can expose a credit balance or monthly budget.\n\n| Kind           | Resets? | Examples                             | Providers                       |\n| -------------- | ------- | ------------------------------------ | ------------------------------- |\n| `reset_window` | yes     | `5h`, `weekly`, `monthly`            | Codex, Claude, Copilot, MiniMax, z.AI |\n| `balance`      | no      | Kilo credit balance, GBP/USD/credits | Kilo, OpenCode                  |\n| `budget`       | yes     | Monthly spend budget                 | Kilo, OpenCode                  |\n| `ungated`      | n/a     | BYOK, local, ungated mode            | Kilo, OpenCode                  |\n| `opaque`       | n/a     | Prepaid gateway subscription         | configured via routes           |\n\n`llm-usage` shows one row per scope. `llm-scheduler` and `ralph-robin` can gate on a specific scope with `--scope`.\n\nPer-provider scope allow-lists:\n\n| Provider       | Supported `--scope` values                     |\n| -------------- | ---------------------------------------------- |\n| Codex          | `auto`, `5h`, `weekly`                         |\n| Claude Code    | `auto`, `5h`, `weekly`                         |\n| MiniMax        | `auto`, `5h`, `weekly`                         |\n| z.AI           | `auto`, `5h`, `weekly`                         |\n| GitHub Copilot | `auto`, `monthly`                              |\n| Kilo Code      | `auto`, `balance`, `budget`, `byok`, `ungated` |\n| OpenCode       | `auto`, `balance`, `budget`, `byok`, `ungated` |\n\n`opaque` is for capacity that exists but cannot be measured before launch — most commonly a prepaid subscription on a gateway. It is selected by an explicit route (`[routes.\u003cid\u003e]` with `capacity.policy = \"opaque\"`); see [Route Mode](#route-mode) below. The canonical scope name is `subscription` so the table reads naturally when the cost is a fixed periodic charge.\n\n## Route Mode\n\nThe default rotation is over **providers** (`[providers.*]`, `--providers`). When the same provider can serve several underlying models with different capacity and cost semantics, you also have access to a **route** rotation (`[ralph].routes`, `[routes.\u003croute_id\u003e]`). A route binds a launch provider, a model, a capacity policy, and a cost policy into a single schedulable unit.\n\nA route is shaped like this:\n\n```toml\n[ralph]\n# When this list is present, ralph-robin rotates over routes in this\n# order. In its absence the legacy provider rotation is used.\nroutes = [\"kilo-minimax-m3\"]\n\n[routes.kilo-minimax-m3]\nprovider     = \"kilo\"\nmodel        = \"minimax-m3\"\nallow_fallback = false\n\n[routes.kilo-minimax-m3.capacity]\n# Capacity policies:\n#   provider        - read the provider's own snapshot (default)\n#   provider_model  - same, but model-aware (claude / codex have model-specific buckets)\n#   delegate        - launch this route's provider, read capacity from another provider\n#   opaque          - capacity exists but cannot be measured before launch\n#   ungated         - usable when the launch CLI is present\n#   balance         - read the provider's balance scope\n#   budget          - read the provider's budget scope\npolicy = \"opaque\"\nscope  = \"subscription\"     # display name; defaults to \"subscription\"\nlabel  = \"MiniMax M3 via Kilo\"\n\n[routes.kilo-minimax-m3.cost]\n# Cost policies (display only; never affect readiness):\n#   included, fixed_subscription, metered_balance, metered_budget,\n#   free, external, unknown\npolicy   = \"fixed_subscription\"\namount   = 20\ncurrency = \"USD\"\nperiod   = \"monthly\"\n```\n\n`llm-usage` then renders:\n\n```\nProvider   Model       Ready   Scope          Remaining         Guidance              Resets in\nKilo       MiniMax M3  yes     subscription   prepaid USD20/mo   ✓ usable              -\n```\n\n`opaque` rows never display a percentage, balance, or reset time. `prepaid USD20/mo` is the cost text; routes without a `fixed_subscription` cost render as `not metered`. There is never a progress bar on these rows.\n\n`delegate` is the route-level successor to the legacy `providers.\u003cx\u003e.capacity_provider` setting. The provider-level setting still works (it is mapped to an implicit route with `capacity.policy = \"delegate\"`), so existing configs do not need to migrate.\n\nThe orchestrator's runtime context and prompt injection include the selected `route_id` and launch provider, so a handoff-style prompt does not stale-route to a different provider.\n\n### One launch CLI, several models in one rotation\n\nA route's `model` is the model ralph-robin pins on the launch command, so two routes that share the same launch provider select different underlying models. This is how one Kilo install serves both `minimax-m3` and `zai/glm-5.2` in a single even-burn rotation:\n\n```toml\n[routes.kilo-minimax-m3]\nprovider = \"kilo\"\nmodel    = \"minimax-m3\"\n[routes.kilo-minimax-m3.capacity]\npolicy   = \"delegate\"     # gate on MiniMax's real 5h/weekly windows\nprovider = \"minimax\"\n\n[routes.kilo-zai-glm-52]\nprovider = \"kilo\"\nmodel    = \"zai/glm-5.2\"\n[routes.kilo-zai-glm-52.capacity]\npolicy   = \"delegate\"     # gate on z.AI's real 5h/weekly windows\nprovider = \"zai\"\n```\n\nThe `--routes` list (and `[ralph].routes`) accepts a mix of declared route ids **and** bare provider names. A bare provider name becomes an implicit route on its own CLI (gated on its own capacity), so you can burn down codex and claude alongside the two Kilo routes without declaring `[routes.codex]` / `[routes.claude]`:\n\n```\nralph-robin --routes codex,claude,kilo-minimax-m3,kilo-zai-glm-52 --prompt-file task.md\n```\n\nThis rotates over four independent capacity pools — Codex's CLI, Claude's CLI, MiniMax M3 (via Kilo), and GLM 5.2 (via Kilo) — and even-burns across whichever are usable.\n\n## Configuration File\n\nFor shared settings across `llm-usage`, `llm-scheduler`, and `ralph-robin`, drop a TOML file at one of these locations (first match wins):\n\n1. `$LLM_TOOLS_CONFIG` (explicit path)\n2. `$XDG_CONFIG_HOME/llm-tools/config.toml`\n3. `~/.config/llm-tools/config.toml`\n\nA missing file is fine - the tools just use their built-in defaults. Parsed with the standard library's `tomllib` (Python 3.11+), so no extra dependency.\n\n**What wins when settings overlap:** built-in defaults \u003c config file \u003c CLI flags. A flag you pass on the command line always beats the same key in the file.\n\nThe main thing the file lets you set is which model each provider should run, and what to do when that model's limit is used up:\n\n```toml\n# ~/.config/llm-tools/config.toml\n\n[defaults]\n# Order ralph-robin tries providers when --providers isn't given.\nproviders     = [\"claude\", \"codex\", \"opencode\"]\n# Which capacity check to use. One of:\n# auto | 5h | weekly | monthly | balance | budget | byok | ungated\nscope         = \"auto\"\n# Minimum quota left before a provider is considered usable.\nmin_remaining = 1\n\n[providers.claude]\n# Run `claude --model sonnet`; only run while Sonnet has capacity.\nmodel          = \"sonnet\"\n# What to do when Sonnet's limit is used up:\n#   false -\u003e skip claude and switch to the next provider\n#   true  -\u003e keep claude and let it pick another model\nallow_fallback = false\n# Optional: override [defaults].scope / min_remaining for just this provider.\n#scope          = \"weekly\"\n#min_remaining  = 5\n\n[providers.codex]\nmodel          = \"spark\"\nallow_fallback = false\n\n[providers.opencode]\n# Tie opencode's availability/capacity to another provider's usage windows\n# instead of its own. Use it when the opencode CLI is configured to run another\n# provider's model (e.g. the MiniMax API), so opencode's own balance is\n# irrelevant. ralph-robin then only routes to opencode while minimax has\n# capacity, ranks even-burn on minimax's remaining, and suspends on minimax's\n# reset — while still launching the opencode CLI.\ncapacity_provider = \"minimax\"\n\n[ralph]                                # ralph-robin-only settings (override [defaults] above)\n# One example key — see config.example.toml for the full list:\nproviders      = [\"claude\", \"codex\", \"kilo\"]\n\n[scheduler]                            # llm-scheduler-only settings (override [defaults] above)\n# One example key — see config.example.toml for the full list:\nprovider       = \"claude\"\n```\n\nA complete template with every supported key (all commented out) is shipped at [config.example.toml](./config.example.toml) in this repository. Copy it to one of the locations above and uncomment the lines you want to set.\n\nWhen `model` is set, `llm-scheduler` and `ralph-robin` call the provider with `--model NAME` and only run while that model still has capacity. With `allow_fallback = false` (the default), the tool treats the provider as unavailable once that model's limit is used up and switches to the next provider. With `allow_fallback = true`, the tool keeps trying the provider but drops the model setting and lets the provider's CLI pick a different model. Pass `--model NAME` on the command line to override the file for a single run.\n\n`capacity_provider` ties one provider's gating to another provider's usage windows. It is generic: any provider may borrow any other provider's windows (a single hop — a provider cannot reference itself or a provider that itself delegates). The borrowing provider's own CLI is still the one launched; only the availability/capacity reading is delegated. The motivating case is a CLI configured to run a different provider's model — for example OpenCode pointed at the MiniMax API, where OpenCode's own balance says nothing about whether a run will succeed. With `capacity_provider = \"minimax\"`, `ralph-robin` only routes to OpenCode while MiniMax has capacity, ranks even-burn on MiniMax's remaining-per-day, and suspends until MiniMax's reset; the scope you gate on (e.g. `5h`, `weekly`) is then validated against MiniMax's windows rather than OpenCode's.\n\nUnknown sections or keys are rejected at load time so typos surface immediately.\n\n## `llm-usage`\n\nUse `llm-usage` before starting work, in status lines, or in scripts that need a compact view of local LLM capacity.\n\n```bash\nllm-usage\nllm-usage --json\nllm-usage --watch 60\nllm-usage --show-copilot-credits\nllm-usage --show-source\nllm-usage --statusline\n```\n\nDefault output shows all supported providers, with one row per capacity scope:\n\n```text\nLLM Usage · 13:03\n\nBars: quota rows █ available · ░ spent   ·   $ rows █ spent · ░ budget left\nGuidance: 5h rows forecast runout; weekly/monthly/budget rows compare remaining quota to time left.\n          $ rows show spend as a share of the overall monthly budget (green low · red at/over).\n          ✓ lasts until reset · ! empty before reset · × empty · ↑ headroom · = on pace · ↓ conserve\n\nProvider   Model     Ready   Scope     Remaining         Guidance              Resets in\n────────   ───────   ─────   ───────   ───────────────   ───────────────────   ──────────\nCodex                yes     5h        90% █████████░    ✓ lasts until reset   4h 34m\n                             weekly    34% ███░░░░░░░    ↓ conserve            5d 2h\n           Spark             5h       100% ██████████    ✓ lasts until reset   4h 34m\n                             weekly    96% █████████░    ↑ headroom            6d 1h\n\nClaude               no      5h         0% ░░░░░░░░░░    × empty               36m\n                             weekly    91% █████████░    ↑ headroom            5d\n           Sonnet            weekly   100% ██████████    ↑ headroom            5d\n\nCopilot              yes     monthly   36% ████░░░░░░    ↓ conserve            17d 11h\n                             spend     $0.0 ░░░░░░░░░░    0% of $50\n\nKilo                 yes     spend    $12.4 ██░░░░░░░░    24.8% of $50\n\nOpenCode             yes     spend     $4.3 █░░░░░░░░░    8.5% of $50\n\nBudget               yes     monthly  $16.7 ███░░░░░░░    33.5% of $50          14d 17h\n```\n\nCost is shown the same way as quota: the amount sits on the left, followed by a\nbar — never a right-aligned `spent $X`. A `spend` row (distinct from a funded\n`balance`) reports money already spent this cycle; with an overall monthly\nbudget configured the bar fills with how much of that budget the spend consumes\n(green when low, red at or over budget), and the trailing `Budget` row totals\nevery provider's spend against the cap (filling past it, in red, if you exceed\nit). Without a budget the rows show just the amount and the total appears as a\nplain `Total` row. Cells with nothing to report (a `spend`/`balance` scope has\nno reset; a full window has no runout forecast) are left blank rather than\npadded with placeholder dashes — only a genuine read failure shows `unavailable`.\n\nThe `Model` column only appears when a provider reports model-specific limits.\nThese sub-rows sit under their provider's section: Codex surfaces its `Spark`\nmodel, and Claude surfaces per-model weekly limits (e.g. `Sonnet`) alongside the\naggregate window. Model rows are informational - they are shown for visibility\nbut do not gate scheduling, which always uses the provider's aggregate scopes.\n\n### Copilot\n\nCopilot shows a second `spend` row with the additional (\"add-on\") usage spent\nthis billing cycle — the dollars billed beyond your included credit allowance.\nThe Copilot CLI does not expose this, so it is read from the GitHub billing API\nusing a GitHub token already on your machine (`COPILOT_GITHUB_TOKEN`, `GH_TOKEN`,\nor `GITHUB_TOKEN`, otherwise `gh auth token`); no Copilot-specific credential is\nrequired. With no token available the row is simply omitted. The row is\ninformational and never affects `Ready`.\n\n**Pay-as-you-go readiness.** Copilot keeps working past its included monthly\nallowance via pay-as-you-go premium requests / AI Credits, up to the spending\nlimit set in your GitHub billing settings. GitHub does not expose that limit\nthrough its API, so once the included allowance is exhausted (`monthly` at 0%)\n`Ready` would otherwise read `no` even though Copilot is still usable. To fix\nthis, llm-usage keeps Copilot `Ready` when overage is **funded**, established\ntwo ways:\n\n- **Declared limit.** Set `[copilot] monthly_spend_limit` (or the\n  `LLM_USAGE_COPILOT_SPEND_LIMIT` / `LLM_USAGE_COPILOT_SPEND_CURRENCY` env\n  overrides) to your GitHub premium-request spending limit. Copilot stays\n  `Ready` while this month's billed overage stays under it, and flips to `no`\n  once the limit is reached.\n- **Auto-detected overage.** With no declared limit, if GitHub billing already\n  shows premium-request / AI-Credit overage being charged this month (net spend\n  \u003e 0), pay-as-you-go is demonstrably enabled and Copilot stays `Ready`.\n\nWith neither signal the previous behaviour stands: an exhausted allowance gates\n`Ready` to `no`. When funded, the exhausted allowance row shows `pay-as-you-go`\n(or `pay-as-you-go $spent/$limit`) in the `Guidance` column instead of a\nmisleading runout/pace hint.\n\nThe Copilot CLI footer shape has changed across releases — pre-1.0.57 printed\n`Plan: N% used` / `Monthly: N% used` (the *used* percentage), 1.0.57+ render\n`Remaining reqs.: N%` (the *remaining* percentage), and 1.0.63+ also dropped\nthe colon on the `AI Credits` line. The dashboard recognises all of these\nshapes. \n\nThe new CLI (\u003e1.0.57) no longer surfaces a *monthly* figure at all in\nits footer, so the reader falls back to GitHub's\n`/users/{login}/settings/billing/premium_request/usage` endpoint, which always\nreports the current month's per-model premium request count. The reader\nsums that against your plan's monthly allowance (300 for Pro, 1500 for Pro+,\n1000 for Enterprise, etc.) to draw the quota bar — overriding the plan via\n`LLM_USAGE_COPILOT_PLAN` or pinning the allowance directly with\n`LLM_USAGE_COPILOT_MONTHLY_ALLOWANCE` is supported. Note: the\nyear+month-only query for the *current* month silently returns an empty\nresult; the reader therefore asks one day at a time so a day with usage\nrecorded shows up immediately rather than being hidden until month-end.\n\n### Table Columns\n\n| Column      | Meaning                                                                                                                                            |\n| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `Model`     | The specific model a row reports on (e.g. `Spark`, `Sonnet`). Blank for a provider's aggregate rows. Only shown when model-specific data exists.   |\n| `Ready`     | `yes` means every blocking scope for that provider has usable capacity now. `no` means at least one scope must reset or recover.                   |\n| `Scope`     | The capacity measure, such as `5h`, `weekly`, `monthly`, `balance`, `budget`, or `spend` (money spent this cycle).                                  |\n| `Remaining` | Remaining percentage, a `$` spend (with a bar when `[budget]` is set), a balance, or an unmetered state such as `byok`, `local`, or `ungated`.      |\n| `Guidance`  | For `5h`, whether current burn should last until reset. For weekly/monthly/budget scopes, pace vs. a linear target. For `$` rows, share of budget. |\n| `Resets in` | Relative reset time. Blank when the scope does not reset (`spend`/`balance`/`ungated`).                                                             |\n\nEmpty cells are intentional (calm design): a cell is left blank when it has\nnothing to report (a non-resetting scope, a window with no runout forecast).\nOnly a real read failure is called out, as `unavailable`.\n\nSet an overall monthly spend budget in `[budget]` (see `config.example.toml`,\nor the `LLM_USAGE_MONTHLY_BUDGET` / `LLM_USAGE_BUDGET_CURRENCY` env overrides) to\nturn every `spend` figure into a coloured progress bar against that one cap, plus\na `Budget` row totalling all providers' spend (filling past the cap, in red, if\nyou exceed it). Without a budget, `spend` rows show just the amount and the total\nappears as a plain `Total` row.\n\n### `llm-usage` Options\n\n| Option                   | Purpose                                                                                                 |\n| ------------------------ | ------------------------------------------------------------------------------------------------------- |\n| `-j`, `--json`           | Print stable JSON with `generated_at`, `codex`, `claude`, `copilot`, `kilo`, `opencode`, and `minimax`. |\n| `-w`, `--watch SECONDS`  | Refresh continuously.                                                                                   |\n| `-C`, `--show-copilot-credits` | Include Copilot AI credits when parseable.                                                        |\n| `-S`, `--show-source`    | Show where each usage row came from.                                                                    |\n| `-s`, `--hide-source`    | Hide the source column. This is the default.                                                            |\n| `-R`, `--show-remaining-time` | Show burn-time estimates.                                                                          |\n| `-r`, `--hide-remaining-time` | Hide burn-time estimates. This is the default.                                                    |\n| `-D`, `--show-daily-budget` | Show the `Guidance` column. This is the default.                                                     |\n| `-d`, `--hide-daily-budget` | Hide the `Guidance` column.                                                                          |\n| `-K`, `--show-codex-spark` | Show Codex Spark rows.                                                                                |\n| `-k`, `--hide-codex-spark` | Hide Codex Spark rows.                                                                                |\n| `-M`, `--copilot-monthly-reset-offset-days DAYS` | Day offset from month start for Copilot monthly reset.                         |\n| `-p`, `--provider-parallelism N` | Number of provider readers to run concurrently. Default: CPU cores; env: `LLM_USAGE_PROVIDER_PARALLELISM`. |\n| `-t`, `--statusline`     | Read Claude Code statusline JSON from stdin and cache it.                                               |\n| `-l`, `--log-only`       | Sample providers and append to the usage log without printing a table.                                  |\n| `-n`, `--no-header`      | Omit the table header.                                                                                  |\n| `--no-service`           | Read providers directly instead of using the local `llm-usage` service.                                 |\n| `--service-install`      | Install and start the continuous sampler with the native user service manager.                          |\n| `--service-uninstall`    | Stop and remove the installed continuous sampler.                                                       |\n| `--service-start`        | Start the installed continuous sampler.                                                                |\n| `--service-stop`         | Stop the installed continuous sampler.                                                                 |\n| `--service-status`       | Show whether the local sampler is running.                                                             |\n| `--service-run`          | Run the sampler in the foreground, useful for supervisors or debugging.                                 |\n| `--service-interval SECONDS` | Continuous sampler refresh interval; default: `60`.                                                |\n| `-h`, `--help`           | Show help.                                                                                              |\n\n## `llm-scheduler`\n\nUse `llm-scheduler` when you want one specific provider to run one specific prompt, but only after that provider has usable capacity.\n\nIt is useful for:\n\n* delayed launches\n* rate-limit-aware retries\n* tmux launches\n* wake scheduling\n* suspend-until-ready workflows\n\n### Basic Usage\n\n```bash\nllm-scheduler --provider codex --prompt-file task.md\nllm-scheduler --provider claude --prompt \"Continue the work in this repo until CI is green\"\nllm-scheduler --provider copilot --prompt-file task.md --retry-delays 60,180,600\nllm-scheduler --provider kilo --prompt-file task.md\nllm-scheduler --provider minimax --prompt-file task.md\n```\n\nRequired form:\n\n```bash\nllm-scheduler --provider codex|claude|copilot|kilo|minimax (--prompt TEXT | --prompt-file FILE) [options]\n```\n\n### Common Examples\n\nRun after a specific local time:\n\n```bash\nllm-scheduler --provider codex --prompt-file task.md --at \"23:05\"\n```\n\nRun inside tmux:\n\n```bash\nllm-scheduler --provider codex --prompt-file task.md --tmux llm-work\n```\n\nSchedule a wake-up:\n\n```bash\nllm-scheduler --provider codex --prompt-file task.md --wake\n```\n\nSuspend until the selected provider is ready:\n\n```bash\nllm-scheduler --provider claude --prompt-file task.md --scope 5h --suspend-until-ready\n```\n\nShow the resolved plan without launching:\n\n```bash\nllm-scheduler --provider codex --prompt-file task.md --dry-run\n```\n\n### Runtime Behavior\n\n* In an interactive terminal, the provider is launched directly and output is written to `attempt-N.out`.\n* In headless or non-terminal mode, the provider runs through a captured PTY.\n* In tmux mode, the provider runs inside the requested tmux session or window.\n* The scheduler exits after success, terminal failure, or retry exhaustion.\n\n### `llm-scheduler` Options\n\n| Option                                                              | Purpose                                                                                                                                            |\n| ------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `-P`, `--provider PROVIDER`                                         | Provider: `codex`, `claude`, `copilot`, `kilo`, `opencode`, or `minimax`.                                                                          |\n| `-p`, `--prompt TEXT`                                               | Prompt text.                                                                                                                                       |\n| `-f`, `--prompt-file FILE`                                          | Read prompt from `FILE`, preserving content.                                                                                                       |\n| `-a`, `--at TIME`                                                   | Delay launch until a `date -d` compatible local time.                                                                                              |\n| `-a`, `--not-before TIME`                                           | Do not launch before a `date -d` compatible local time.                                                                                            |\n| `-s`, `--scope auto\\|5h\\|weekly\\|monthly\\|balance\\|budget\\|byok\\|ungated` | Capacity scope to gate on. Default: `auto`.                                                                                              |\n| `-W`, `--window SCOPE`                                              | Deprecated alias for `--scope`.                                                                                                                    |\n| `-m`, `--min-remaining PERCENT`                                     | Minimum remaining capacity required to launch. Default: `1`.                                                                                       |\n| `-i`, `--poll-interval SECONDS`                                     | Usage polling interval. Default: `60`.                                                                                                             |\n| `-u`, `--max-unavailable-wait SECONDS`                              | Maximum wait when usage cannot be measured. Default: `900`. Use `0` to wait forever. Known reset times still wait for reset.                       |\n| `-r`, `--retry-delays LIST`                                         | Retry delays. Default: `60,180,600`.                                                                                                               |\n| `-R`, `--no-retry`                                                  | Disable retries.                                                                                                                                   |\n| `-C`, `--cwd DIR`                                                   | Set provider working directory.                                                                                                                    |\n| `-F`, `--fresh`                                                     | Launch a fresh foreground provider process. This is the default.                                                                                   |\n| `-H`, `--headless`                                                  | Force non-interactive provider command and captured PTY.                                                                                           |\n| `-T`, `--tmux SESSION[:WINDOW]`                                     | Run through tmux.                                                                                                                                  |\n| `-e`, `--command-template TEMPLATE`                                 | Override provider syntax. Supports `{provider}`, `{prompt}`, `{prompt_file}`, and `{cwd}`. Parsed with Python `shlex`, not a shell.                    |\n| `-y`, `--auto-confirm`                                              | Auto-confirm recognised safe trust prompts. This is the default.                                                                                   |\n| `-Y`, `--no-auto-confirm`                                           | Disable safe auto-confirmation.                                                                                                                    |\n| `-I`, `--headless-idle-timeout SECONDS`                             | Abort headless fresh mode after no output progress. Default: `600`. Use `0` to disable.                                                            |\n| `-Q`, `--headless-question-timeout SECONDS`                         | Abort headless fresh mode after question-like output stalls. Default: `30`. Use `0` to disable.                                                    |\n| `-L`, `--log-dir DIR`                                               | Set scheduler log root.                                                                                                                            |\n| `-O`, `--run-dir DIR`                                               | Write or resume a specific run directory.                                                                                                          |\n| `-d`, `--dry-run`                                                   | Resolve usage, timing, command plan, and logs without launching.                                                                                   |\n| `-k`, `--wake`                                                      | Enable best-effort wake scheduling.                                                                                                                |\n| `-U`, `--suspend-until-ready`                                       | Schedule a resumed run, enable wake, suspend the machine, and continue after wake.                                                                 |\n| `-x`, `--wake-test`                                                 | Print wake diagnostics without scheduling work.                                                                                                    |\n| `-h`, `--help`                                                      | Show help.                                                                                                                                         |\n\n### Default Provider Commands\n\n| Provider       | Interactive                    | Headless                             |\n| -------------- | ------------------------------ | ------------------------------------ |\n| Codex          | `codex -C \u003ccwd\u003e \u003cprompt\u003e`      | `codex exec -C \u003ccwd\u003e \u003cprompt\u003e`       |\n| Claude Code    | `claude \u003cprompt\u003e`              | `claude --print \u003cprompt\u003e`            |\n| GitHub Copilot | `copilot -C \u003ccwd\u003e -i \u003cprompt\u003e` | `copilot -C \u003ccwd\u003e --prompt \u003cprompt\u003e` |\n| Kilo Code      | `kilo run \u003cprompt\u003e`            | `kilo run --dir \u003ccwd\u003e \u003cprompt\u003e`      |\n| OpenCode       | `opencode`                     | `opencode run --dir \u003ccwd\u003e \u003cprompt\u003e`  |\n| MiniMax        | `mmx`                          | `mmx run --auto -C \u003ccwd\u003e \u003cprompt\u003e`   |\n| z.AI           | _launch via a route_           | _launch via a route_                 |\n\nKilo Code and OpenCode accept `-m, --model \u003cprovider\u003e/\u003cmodel\u003e`; the scheduler and Ralph inject this flag when the per-provider policy or route pins a model (e.g. `-m zai/glm-4.7`). No permission-bypassing flag is injected; whether a headless run may act without prompting is governed by each tool's own permission config.\n\nInteractive Kilo Code, OpenCode, and MiniMax take no working-directory flag — they inherit it from the launching process (the scheduler sets the subprocess `cwd`). Headless Kilo Code and OpenCode inject no permission-bypassing flag; whether an autonomous run may act without prompting is governed by each tool's own permission config.\n\nThe default Claude adapter uses your local Claude Code permission settings. To override Claude Code settings for one scheduler run:\n\n```bash\nllm-scheduler --provider claude --prompt-file task.md --command-template 'claude --permission-mode plan --print {prompt}'\n```\n\n## `ralph-robin`\n\nUse `ralph-robin` when the task matters more than which provider runs it.\n\nIt runs a [Ralph loop](https://venturebeat.com/technology/how-ralph-wiggum-went-from-the-simpsons-to-the-biggest-name-in-ai-right-now/): a persistent autonomous workflow that keeps going instead of stopping when one provider reaches a limit, stalls, or becomes temporarily unusable.\n\nThis makes it useful for long-running coding, repair, hardening, documentation, and investigation tasks where stopping at the first rate limit would waste time.\n\n`ralph-robin` wraps `llm-scheduler` and rotates across configured providers. It can either:\n\n* keep using the current provider until it is exhausted\n* distribute work more evenly so provider limits burn down at a similar rate\n\nEven burn-down is the default.\n\nWhen Ralph selects Claude Code through the built-in adapter, it uses Claude's `stream-json` print mode and renders that event stream as readable stdout. Assistant text, tool calls, and tool results appear while the run is active.\n\n### Basic Usage\n\n```bash\nralph-robin --prompt-file task.md\nralph-robin --prompt \"Continue until tests pass\"\nralph-robin --providers claude,codex,copilot,kilo,minimax --prompt-file task.md\nralph-robin --prompt-file task.md --tmux llm-work\nralph-robin --prompt-file task.md --dry-run\n```\n\nExample output:\n\n```text\nralph-robin --prompt-file /home/chris/dev/ralph.prompt.md\n[16:22:47] ◆ ralph-robin: · logs: /home/chris/.cache/llm-tools/ralph-robin/logs/20260613-162247-ralph-robin-r67_na63\n[16:22:47] ◆ ralph-robin: · usage claude: usable (5h 22% left, weekly 85% left) | codex: usable (5h 99% left, weekly 35% left)\n[16:22:47] ◆ ralph-robin: ✓ selected claude (even-burn)\n[16:22:52 claude] I'll begin the RALPH loop iteration following FAST-PATH STARTUP. Let me start by establishing current state.\n[16:22:54 claude] Tool call: Bash\n[16:22:54 claude] {\n[16:22:54 claude]   \"command\": \"git status \u0026\u0026 echo \\\"---LATEST COMMIT---\\\" \u0026\u0026 git log --oneline -5 \u0026\u0026 echo \\\"---BRANCH---\\\" \u0026\u0026 git branch --show-current\",\n[16:22:54 claude]   \"description\": \"Check git status, branch, and recent commits\"\n[16:22:54 claude] }\n```\n\nEach relayed provider line is prefixed with `[time provider]` by default. This makes a quiet increment distinguishable from a wedged one and keeps the active provider visible.\n\nCustomize the prefix with `--prefix`:\n\n```bash\nralph-robin --prompt-file task.md --prefix time,provider,usage\nralph-robin --prompt-file task.md --prefix none\n```\n\n### `ralph-robin` Options\n\n| Option                                                              | Purpose                                                                                                                                                       |\n| ------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `-P`, `--providers LIST`                                            | Set comma-separated provider rotation. Values include `claude`, `codex`, `copilot`, `kilo`, `opencode`, and `minimax`.                                       |\n| `-p`, `--prompt TEXT`                                               | Prompt text passed to the selected provider.                                                                                                                  |\n| `-f`, `--prompt-file FILE`                                          | Prompt file passed to the selected provider.                                                                                                                  |\n| `-s`, `--scope auto\\|5h\\|weekly\\|monthly\\|balance\\|budget\\|byok\\|ungated` | Capacity scope to gate on. Default: `auto`.                                                                                                         |\n| `-W`, `--window SCOPE`                                              | Deprecated alias for `--scope`.                                                                                                                              |\n| `-m`, `--min-remaining PERCENT`                                     | Minimum remaining capacity required to launch. Default: `1`.                                                                                                  |\n| `-i`, `--poll-interval SECONDS`                                     | Poll interval passed to `llm-scheduler`. Default: `60`.                                                                                                      |\n| `-u`, `--max-unavailable-wait SECONDS`                              | Bound inconclusive usage waits before optimistic launch. Default: `900`; `0` waits forever.                                                                  |\n| `-r`, `--retry-delays LIST`                                         | Retry delays. Default: `60,180,600`.                                                                                                                          |\n| `-R`, `--no-retry`                                                  | Disable retries.                                                                                                                                              |\n| `-e`, `--even-burn`                                                 | Spread work to burn provider quota down evenly. Enabled by default.                                                                                           |\n| `-E`, `--no-even-burn`                                              | Keep using the current provider until it is exhausted.                                                                                                        |\n| `-n`, `--max-iterations N`                                          | Stop after `N` successful increments. Default `0` means no iteration cap. Use `1` for single-shot.                                                            |\n| `-D`, `--max-duration D`                                            | Stop once `D` of wall-clock time elapses, such as `24h`, `90m`, `30s`, or seconds. Default: `24h`. Use `0` to disable. Whichever limit is reached first wins. |\n| `-I`, `--min-iteration-seconds N`                                   | Minimum runtime floor for successive increments. Default: `5`; `0` disables.                                                                                  |\n| `-x`, `--prefix LIST`                                               | Comma-separated fields stamped on each relayed provider line. Fields: `time`, `provider`, `usage`. Default: `time,provider`. Use `none` or an empty value to disable. |\n| `-X`, `--prefix-usage-interval S`                                   | Refresh interval in seconds for the cached `usage` prefix field. Default: `15`. Use `0` to refresh every line.                                                |\n| `-C`, `--cwd DIR`                                                   | Set provider working directory.                                                                                                                               |\n| `-F`, `--fresh`                                                     | Launch a fresh provider process through `llm-scheduler`.                                                                                                      |\n| `-H`, `--headless`                                                  | Force non-interactive provider command and captured PTY.                                                                                                      |\n| `-T`, `--tmux SESSION[:WINDOW]`                                     | Execute through tmux via `llm-scheduler`.                                                                                                                     |\n| `-g`, `--command-template TEMPLATE`                                 | Override provider syntax. Supports `{provider}`, `{prompt}`, `{prompt_file}`, and `{cwd}`.                                                                        |\n| `-y`, `--auto-confirm`                                              | Auto-confirm recognised safe trust prompts. This is the default.                                                                                              |\n| `-Y`, `--no-auto-confirm`                                           | Disable safe auto-confirmation.                                                                                                                               |\n| `-q`, `--headless-idle-timeout SECONDS`                             | Abort headless fresh mode after no output progress. Default: `600`. Use `0` to disable.                                                                       |\n| `-Q`, `--headless-question-timeout SECONDS`                         | Abort headless fresh mode after question-like output stalls. Default: `30`. Use `0` to disable.                                                               |\n| `-S`, `--state-file FILE`                                           | Store current provider index. Default: `${XDG_CACHE_HOME:-$HOME/.cache}/llm-tools/ralph-robin/state.json`.                                                    |\n| `-L`, `--log-dir DIR`                                               | Set Ralph log directory. Default: `${XDG_CACHE_HOME:-$HOME/.cache}/llm-tools/ralph-robin/logs`.                                                               |\n| `-k`, `--wake`                                                      | Pass best-effort wake scheduling to `llm-scheduler`.                                                                                                          |\n| `-U`, `--suspend-until-ready`                                       | Suspend even for the selected provider's own wait gates.                                                                                                     |\n| `--watchdog`                                                       | Arm a hardware watchdog across each machine suspend so a wedged resume reboots instead of hanging. See [Reliable sleep/wake](#reliable-sleepwake).            |\n| `-d`, `--dry-run`                                                   | Resolve rotation and usage state without submitting.                                                                                                          |\n| `-h`, `--help`                                                      | Show help.                                                                                                                                                    |\n\nThe following options are passed through to `llm-scheduler`:\n\n```text\n-s, --scope\n-m, --min-remaining\n-i, --poll-interval\n-u, --max-unavailable-wait\n-r, --retry-delays\n-R, --no-retry\n-C, --cwd\n-F, --fresh\n-H, --headless\n-T, --tmux\n-g, --command-template\n-y, --auto-confirm\n-Y, --no-auto-confirm\n-q, --headless-idle-timeout\n-Q, --headless-question-timeout\n```\n\n### `ralph-robin` Behavior\n\n`ralph-robin` owns the full rotation loop: provider selection, retries, waiting, suspend decisions, and handoff between providers.\n\n#### Launch Mode\n\n* Starts provider increments in autonomous headless mode by default, even from an interactive terminal.\n* Re-evaluates provider capacity before each increment.\n* Submits the same prompt again after each increment, so long-running work can continue across provider boundaries.\n\n#### Provider Selection\n\nBy default, `ralph-robin` uses **even burn-down**. When several providers are ready, it prefers the provider with the highest remaining pace-adjusted capacity.\n\nThe selector:\n\n* ranks long-period **plan** scopes such as `weekly`, `monthly`, and Kilo `budget`\n* scores each provider by its **binding (most-constrained) plan scope**, not its most generous one, so a provider whose weekly is draining ranks below a peer with weekly headroom and the rotation hands over instead of running one plan to the floor\n* excludes the short `5h` **session** window from the surplus ranking — it still gates usability, but it resets too fast to signal plan surplus, and folding it in let a momentarily-full 5h window mask a draining weekly (which kept the loop pinned to one provider, e.g. Codex, without ever handing over)\n* treats `balance` and `ungated` scopes as usable, but not pace-rankable\n* assumes an unknown or stale reset has a full week available, so the provider can still be ranked instead of being skipped\n\nProviders that expose no rankable plan scope — an opaque prepaid **subscription** such as MiniMax M3 via Kilo Code — are still rotated **fairly and evenly**: they take turns by least-completed count alongside the measurable providers (the surplus score only breaks ties between two measurable providers at the same count), so an opaque subscription is neither starved nor allowed to monopolise the loop.\n\nUse `--no-even-burn` to keep using the current provider until it is exhausted.\n\n#### Blocking and Recovery\n\n`ralph-robin` does not stop just because every provider is currently blocked.\n\nWhen no provider is usable, it waits or suspends until the rotation can recover. The loop ends only when one of these conditions is met:\n\n* a non-recoverable failure occurs\n* a degenerate instant-success streak is detected\n* `--max-duration` is reached\n* `--max-iterations` is reached\n\n#### Output Handling\n\nProvider output is streamed live.\n\nOn interactive terminals, `ralph-robin` highlights:\n\n* status lines\n* diffs\n* commands\n* warnings\n* errors\n\nColors are disabled automatically for non-TTY output, `TERM=dumb`, `NO_COLOR`, or `LLM_USAGE_NO_COLOR`.\n\n#### Failure Handling\n\nIf usage cannot be measured, `ralph-robin` tries that provider before suspending.\n\nIf a provider exits with a scheduler autonomy abort, `ralph-robin` skips that provider for the current invocation and tries the next usable provider.\n\n\n### Suspend and Wake Behavior\n\nWhen all providers are blocked, Ralph sets an RTC wake-up timer for the earliest known reset across the whole rotation, then resumes its own loop after wake.\n\nIf suspend infrastructure is unavailable, the lead time is too short, `LLM_SCHEDULER_NO_ACTUAL_SUSPEND=1` is set, or `--dry-run` is used, Ralph falls back to an in-process wait (the machine stays awake — always correct, it just forgoes power saving).\n\nThis is different from:\n\n```bash\nllm-scheduler --suspend-until-ready\n```\n\nThat scheduler mode wakes into one selected provider. Ralph wakes back into cross-provider rotation.\n\n### Reliable sleep/wake\n\nAn overnight Ralph run waits out provider reset windows by suspending the whole machine. That is only safe if the box reliably resumes, and only sensible if nothing *else* suspends it underneath the run. Ralph handles both, simply and portably.\n\n**It defers your OS auto-suspend while it works.** Most desktops auto-suspend after some idle period (for example KDE PowerDevil, GNOME, or `logind`'s `IdleAction`). That idle timer does not know a headless job is busy, so it can suspend the machine mid-iteration — with no wake armed, leaving it asleep until you touch it (and then possibly hanging on a flaky resume). For its whole run Ralph holds a logind **`idle` inhibitor** (`systemd-inhibit --what=idle`) so the OS will not auto-suspend out from under it. An `idle` inhibitor does **not** block an explicit suspend, so Ralph still controls its own deliberate, RTC-armed suspends. You do not have to change your auto-suspend settings.\n\n\u003e You can still keep your normal auto-suspend timeout. If you prefer the belt-and-braces approach, set it longer than a typical Ralph session — but that is brittle for long runs, which is exactly why Ralph inhibits idle instead of relying on it.\n\n**Its own suspends are verified and recorded:**\n\n* **A wake is always armed before suspending.** Ralph never suspends without first arming an RTC wake (via `rtcwake -m no` when it can, otherwise a `systemd-run --user` `WakeSystem=true` timer). If it cannot arm one, it does not suspend — it waits awake instead.\n* **Wakes are verified by behaviour.** After resume, Ralph checks how far the wall clock landed from the target. A wake that fires far from target (or not at all) is treated as unreliable, and Ralph stays awake for the rest of that run rather than risk repeating a bad cycle.\n* **Suspend churn is capped.** A minimum awake interval between suspends and an optional per-run cap stop flaky suspend/resume hardware from being cycled dozens of times unattended.\n* **A durable ledger survives a wedged resume.** Ralph writes an fsync'd marker before each suspend and another after resume. If a resume ever wedges the machine and it is hard-reset, the unfinished cycle is still on disk — Ralph (and the soak tool) warn about it on the next start.\n* **`--watchdog` (opt-in) recovers a wedged resume.** With `--watchdog`, Ralph arms a hardware watchdog across each suspend so a hung resume reboots the machine instead of hanging. This needs a usable `/dev/watchdog` whose timer keeps counting across S3 (a TCO/iTCO or IPMI watchdog, or `RuntimeWatchdogSec=` in `systemd-system.conf`); without one it is a logged no-op.\n\n**Portability.** The suspend backend is feature-detected, not distro-specific: it works across systemd-based Linux and degrades to an awake wait where the tools are missing. The design is modular so a future macOS backend (`caffeinate` / `pmset schedule wake`) can be added without changing how Ralph uses it.\n\nTuning knobs: `LLM_TOOLS_NO_INHIBIT`, `LLM_TOOLS_SUSPEND_DRIFT_TOLERANCE`, `LLM_RALPH_MIN_AWAKE_SECONDS`, `LLM_RALPH_MAX_SUSPENDS`, `LLM_SCHEDULER_SUSPEND_MIN_LEAD`, `LLM_TOOLS_WATCHDOG_DEVICE`. Run `llm-scheduler --wake-test` to see what the current host supports.\n\n### `llm-sleep-soak` — prove sleep/wake is reliable\n\nBefore trusting unattended overnight runs, soak-test the exact suspend/wake path on your hardware:\n\n```bash\nllm-sleep-soak --cycles 50 --period 90s        # 50 real suspend/wake cycles\nllm-sleep-soak --cycles 20 --period 2m --watchdog --json\n```\n\nEach cycle suspends the machine, wakes it via the same verified RTC path Ralph uses, measures the wake drift, scrapes the kernel log for resume errors, and records the cycle in the durable ledger. It prints a `PASS`/`FAIL` summary and exits non-zero if any cycle resumed late or logged a resume error — or if an earlier run left a cycle unfinished (the fingerprint of a past wedged resume).\n\nThis is a **real-hardware test**: it genuinely suspends the machine and therefore cannot run in CI. Run it when the machine is otherwise idle. `LLM_SCHEDULER_NO_ACTUAL_SUSPEND=1` runs the whole loop in simulation (no real sleep) if you just want to see the flow.\n\n## Provider Setup Details\n\nMost providers only need their CLI installed and authenticated once. Kilo and MiniMax also support environment-variable fallbacks, which are useful for CI and deterministic tests.\n\n### Kilo Code\n\nKilo is configured primarily through environment variables, so it can be driven from CI or Ralph without changing local state.\n\n| Variable                           | Purpose                                                                   |\n| ---------------------------------- | ------------------------------------------------------------------------- |\n| `LLM_USAGE_KILO_MODE`              | `gateway` (default), `budget`, `byok`, `local`, or `ungated`.             |\n| `LLM_USAGE_KILO_BALANCE`           | Remaining credit balance. Required for `balance` scope.                   |\n| `LLM_USAGE_KILO_CURRENCY`          | Currency or unit label, such as `GBP`, `USD`, or `credits`.               |\n| `LLM_USAGE_KILO_MIN_BALANCE`       | Minimum remaining balance required to consider Kilo usable. Default: `1`. |\n| `LLM_USAGE_KILO_MONTHLY_BUDGET`    | Total monthly budget. Enables `budget` scope.                             |\n| `LLM_USAGE_KILO_MONTHLY_SPENT`     | Amount already spent in this budget period.                               |\n| `LLM_USAGE_KILO_MONTHLY_RESET_DAY` | Day of month the budget resets. Default: `1`.                             |\n\nWhen `kilo` is on `PATH`, `llm-usage` and `llm-scheduler` try `kilo stats` first. JSON and text output are supported. If that fails or cannot be parsed, they fall back to the environment variables above.\n\nWith `--scope auto`, Kilo prefers:\n\n1. `budget`, when configured\n2. `balance`, when configured\n3. `ungated`\n\n#### Gateway-backed models via routes (Kilo + MiniMax M3)\n\nWhen Kilo sells a model from another provider (e.g. `minimax-m3` purchased through the Kilo gateway), the entitlement lives behind the Kilo gateway: the direct `mmx quota show` reads the user's *direct* MiniMax account, not the Kilo-purchased subscription, so it is the wrong truth source. Model the route as `opaque`:\n\n```toml\n[ralph]\nroutes = [\"kilo-minimax-m3\"]\n\n[routes.kilo-minimax-m3]\nprovider = \"kilo\"\nmodel    = \"minimax-m3\"\n[routes.kilo-minimax-m3.capacity]\npolicy = \"opaque\"\nscope  = \"subscription\"\nlabel  = \"MiniMax M3 via Kilo\"\n[routes.kilo-minimax-m3.cost]\npolicy   = \"fixed_subscription\"\namount   = 20\ncurrency = \"USD\"\nperiod   = \"monthly\"\n```\n\n`llm-usage` then renders:\n\n```\nProvider   Model       Ready   Scope          Remaining         Guidance   Resets in\nKilo       MiniMax M3  yes     subscription   prepaid USD20/mo   ✓ usable   -\n```\n\nThe route is usable whenever the Kilo CLI is on `PATH` and no local runtime block is recorded. If a Kilo run returns a real retryable error (e.g. `HTTP 429`, `quota exceeded`), the scheduler records a local block under `${XDG_CACHE_HOME:-$HOME/.cache}/llm-tools/routes/blocks/\u003cid\u003e.json` so Ralph stops selecting the route until the retry window passes. A successful run clears the block.\n\nThe legacy `providers.\u003cx\u003e.capacity_provider = \"\u003cy\u003e\"` setting is for the *truthful* delegation case (the configured CLI runs another provider's model and that provider's windows truthfully describe the capacity). Use `routes.\u003cid\u003e.capacity.policy = \"opaque\"` when no such truth source exists.\n\n#### Truthful delegation via routes (z.AI GLM via Kilo or OpenCode)\n\nz.AI exposes the GLM family (e.g. `GLM-4.7`, `GLM-5.2`) with a real 5h / weekly quota served by `https://api.z.ai/api/monitor/usage/quota/limit`. There is no z.AI CLI to launch — Kilo (or OpenCode) runs the model and the z.AI API is the truthful capacity source. Model the route with `capacity.policy = \"delegate\"` and `provider = \"zai\"`:\n\n```toml\n[ralph]\nroutes = [\"kilo-zai-glm-4-7\", \"kilo-zai-glm-5-2\"]\n\n[routes.kilo-zai-glm-4-7]\nprovider = \"kilo\"\nmodel    = \"zai/glm-4.7\"\n\n[routes.kilo-zai-glm-4-7.capacity]\npolicy   = \"delegate\"\nprovider = \"zai\"\n\n[routes.kilo-zai-glm-5-2]\nprovider = \"kilo\"\nmodel    = \"zai/glm-5.2\"\n\n[routes.kilo-zai-glm-5-2.capacity]\npolicy   = \"delegate\"\nprovider = \"zai\"\n```\n\n`llm-scheduler` and `ralph-robin` then:\n\n* call the launch CLI with `-m zai/glm-4.7` / `-m zai/glm-5.2` so the model pin reaches the provider,\n* gate, rank, and suspend on z.AI's real 5h / weekly windows (read directly from the API),\n* rotate between the two routes with even-burn so each GLM model's quota burns down in parallel rather than one provider eating the whole allowance.\n\nThe z.AI capacity reader needs a z.AI key — discovered automatically from Kilo's/OpenCode's `auth.json` (zero-config), or set explicitly via `ZAI_API_KEY` / `LLM_USAGE_ZAI_API_KEY` — and is gated on the usual `5h` / `weekly` / `auto` scopes. A bad or missing key surfaces as `not-authenticated` in the `Ready` / `Remaining` cells rather than a generic `unavailable`, so the user can distinguish \"wrong key\" from \"API down\".\n\n### MiniMax\n\nMiniMax quota is read from the local `mmx` CLI first. Environment variables are available as a fallback for tests and controlled environments.\n\n| Variable                               | Purpose                                                            |\n| -------------------------------------- | ------------------------------------------------------------------ |\n| `LLM_USAGE_MINIMAX_5H_PERCENT`         | Remaining percentage for the 5h session window, from `0` to `100`. |\n| `LLM_USAGE_MINIMAX_5H_RESET_EPOCH`     | Epoch seconds or milliseconds when the 5h window resets.           |\n| `LLM_USAGE_MINIMAX_WEEKLY_PERCENT`     | Remaining percentage for the weekly window, from `0` to `100`.     |\n| `LLM_USAGE_MINIMAX_WEEKLY_RESET_EPOCH` | Epoch seconds or milliseconds when the weekly window resets.       |\n| `LLM_USAGE_MINIMAX_MODEL`              | `model_remains` row to read. Default: `general`.                   |\n| `LLM_USAGE_MINIMAX_TIMEOUT`            | Timeout for `mmx quota show`, in seconds. Default: `10`.           |\n\nWhen `mmx` is on `PATH`, `llm-usage` and `llm-scheduler` try:\n\n```bash\nmmx quota show --output json\n```\n\nIf the CLI is missing or the output cannot be parsed, they fall back to the environment variables above.\n\nThe MiniMax reader uses the `general` row from `model_remains` by default and exposes the same `5h` and `weekly` reset windows used by Claude Code and Codex. This lets the table render and gate MiniMax consistently with other reset-window providers.\n\nThe MiniMax row appears only when the `mmx` CLI is installed or MiniMax environment variables are set.\n\n### z.AI (GLM via Kilo or OpenCode)\n\nz.AI is a capacity-only provider: there is no `zai` CLI to launch, only the GLM family (`GLM-4.7`, `GLM-5.2`, …) served through Kilo (or any provider that exposes the `zai/\u003cmodel\u003e` id). `llm-usage` reads the user's z.AI quota directly from the official monitoring API; `llm-scheduler` / `ralph-robin` launch the configured provider with `-m zai/\u003cmodel\u003e` via a route with `capacity.policy = \"delegate\"` and `provider = \"zai\"`.\n\n**Zero-config key discovery.** You do not configure a z.AI key in `llm-tools`. When you authenticate z.AI in Kilo (or OpenCode), the agent stores the key in its own owner-only credential file — `$XDG_DATA_HOME/{kilo,opencode}/auth.json` (default `~/.local/share/...`, mode `0600`), shaped `{\"zai\": {\"type\": \"api\", \"key\": \"…\"}}`. The reader discovers it there automatically, the same way the Claude/Codex readers read their CLIs' auth files, so adding a z.AI account to Kilo lights up the dashboard row with no further setup. The environment variables below remain available as an explicit override / hermetic-test path, but are not required.\n\n| Variable                          | Purpose                                                            |\n| --------------------------------- | ------------------------------------------------------------------ |\n| `ZAI_API_KEY`                     | Bearer token override (takes precedence over the discovered key) used against `https://api.z.ai/api/monitor/usage/quota/limit`. |\n| `LLM_USAGE_ZAI_API_KEY`           | Same, but overrides `ZAI_API_KEY` (mainly for tests).              |\n| `LLM_USAGE_ZAI_MODEL`             | Display-only GLM pin (e.g. `zai/glm-4.7`); does not affect gating. |\n| `LLM_USAGE_ZAI_5H_PERCENT`        | Remaining percentage for the 5h window, `0..100`. Hermetic fallback. |\n| `LLM_USAGE_ZAI_5H_RESET_EPOCH`    | Epoch seconds or milliseconds when the 5h window resets.           |\n| `LLM_USAGE_ZAI_WEEKLY_PERCENT`    | Remaining percentage for the weekly window, `0..100`.              |\n| `LLM_USAGE_ZAI_WEEKLY_RESET_EPOCH`| Epoch seconds or milliseconds when the weekly window resets.       |\n| `LLM_USAGE_ZAI_TIMEOUT`           | HTTP timeout in seconds. Default `10`.                             |\n| `LLM_USAGE_ZAI_QUOTA_LIMIT_JSON`  | Inject a fully-formed `/api/monitor/usage/quota/limit` payload (or `{5h, weekly}` scopes). Overrides the live API for tests. |\n\nWith a discovered (or explicitly configured) key, `llm-usage` calls the international endpoint first:\n\n```bash\ncurl -fsS -H \"Authorization: Bearer $ZAI_API_KEY\" \\\n  https://api.z.ai/api/monitor/usage/quota/limit\n```\n\nThe response lists several `limits`, each with a window length of `number × unit` (z.AI's `unit` is a time-unit enum — `3` = hour, `6` = week, `5` = month). The reader classifies the rows it surfaces by that **window length**, not the `type` label: the shortest sub-day window is the `5h` row, a roughly-one-week window is the `weekly` row, and longer (monthly) windows — e.g. z.AI's separate tool/search quota — are surfaced by neither. `percentage` is the *used* percent, flipped to remaining for the bar. (When `unit`/`number` are absent — an older payload shape — it falls back to the `type` label, then to `nextResetTime` ordering: shortest reset → 5h, longest → weekly.)\n\n```json\n{\n  \"code\": 200,\n  \"data\": {\n    \"level\": \"lite\",\n    \"limits\": [\n      {\"type\": \"TIME_LIMIT\",   \"unit\": 5, \"number\": 1, \"percentage\": 0,  \"nextResetTime\": 1784878391978},\n      {\"type\": \"TOKENS_LIMIT\", \"unit\": 3, \"number\": 5, \"percentage\": 97, \"nextResetTime\": 1782304670000},\n      {\"type\": \"TOKENS_LIMIT\", \"unit\": 6, \"number\": 1, \"percentage\": 19, \"nextResetTime\": 1782891191000}\n    ]\n  }\n}\n```\n\nHere the `unit 3 × number 5` row is the 5-hour window (97 % used → 3 % remaining, resets in ~1 h), the `unit 6` row is the weekly window (19 % used → 81 % remaining), and the `unit 5` month-long row is *not* shown — the earlier label-only mapping wrongly captured it as the 5h row.\n\nIf the international endpoint is unreachable (network, DNS, TLS) the reader falls back to `https://open.bigmodel.cn/api/monitor/usage/quota/limit` (the China mirror). When *both* endpoints fail the live call, the reader surfaces the classified reason — `not-authenticated` (HTTP 401/403), `subscription-required`, `rate-limited`, `network-error`, `quota-error` — instead of the generic `inconclusive-usage`, so a wrong key reads differently from a network outage.\n\nWhen no key can be discovered from Kilo/OpenCode and none is configured via the environment, the z.AI row renders as `unavailable` and is excluded from provider-fan-out and route decisions.\n\nLaunching a z.AI model directly via `--provider zai` is intentionally rejected — z.AI has no CLI to run; always go through a route (typically `provider = \"kilo\"` with `model = \"zai/\u003cmodel\u003e\"`). The scheduler/ralph invocations below make this concrete:\n\n```bash\n# Single GLM model on Kilo, gated on z.AI's 5h quota.\nllm-scheduler --provider kilo --model zai/glm-4.7 --prompt-file task.md --scope 5h\n\n# Even-burn across two GLM models on Kilo, gated on z.AI's real windows.\nralph-robin --routes kilo-zai-glm-4-7,kilo-zai-glm-5-2 --prompt-file task.md\n```\n\n`llm-usage --json` then emits a `zai` top-level key next to `codex`, `claude`, `copilot`, `kilo`, `opencode`, and `minimax`, with the parsed `5h` and `weekly` scopes under `zai.scopes` and the GLM pin under `zai.selected_model`.\n\n## Logs and Cache\n\nRuntime data lives under:\n\n```text\n${XDG_CACHE_HOME:-$HOME/.cache}/llm-tools\n```\n\nDirectory layout:\n\n```text\nllm-tools/llm-usage/                 Usage caches and llm-usage.log\nllm-tools/llm-usage/service/         Local service latest snapshot, history JSONL, logs\nllm-tools/llm-scheduler/logs/        Per-run scheduler logs\nllm-tools/ralph-robin/               Ralph state and logs\n```\n\nEach scheduler run directory contains:\n\n```text\nrun.log\nevents.jsonl\nprompt.txt\nattempt-N.out\nattempt-N.status\n```\n\nThe scheduler logs:\n\n* arguments\n* prompt source\n* prompt SHA-256\n* prompt content\n* usage snapshots\n* wait decisions\n* command plan\n* output\n* exit code\n* retry delays\n* final status\n\nUseful symlinks:\n\n```text\n~/.cache/llm-tools/llm-scheduler/logs/latest\n~/.cache/llm-tools/llm-scheduler/logs/latest-claude\n~/.cache/llm-tools/llm-scheduler/logs/latest-codex\n```\n\nRalph logs live under:\n\n```text\n${XDG_CACHE_HOME:-$HOME/.cache}/llm-tools/ralph-robin/logs\n```\n\nChild scheduler logs are written under each Ralph run's `scheduler/` subdirectory.\n\n## Data Sources\n\n`llm-tools` reads local provider state where possible.\n\n| Provider       | Source                                                                      |\n| -------------- | --------------------------------------------------------------------------- |\n| Codex          | Live `codex app-server` rate limits, then cache, then local `~/.codex/sessions` JSONL. |\n| Claude Code    | OAuth usage API/cache with automatic OAuth token refresh, then statusline cache, then local project JSONL fallback. |\n| GitHub Copilot | Local Copilot CLI footer captured through a bounded PTY helper.             |\n| Kilo Code      | `kilo stats`, then Kilo environment variables.                              |\n| MiniMax        | `mmx quota show --output json`, then MiniMax environment variables.         |\n\n`llm-usage` reads providers concurrently. Configure fan-out with\n`--provider-parallelism` or `LLM_USAGE_PROVIDER_PARALLELISM`; the default is\nthe number of CPU cores.\n\n### How refreshing works\n\nEvery provider is **actively refreshed** on each run - `llm-usage` never just\nechoes a session log left behind the last time you used a CLI. Each reader asks\nthe provider for current numbers and only falls back if that fails:\n\n| Provider       | Active refresh → fallback                                                   |\n| -------------- | --------------------------------------------------------------------------- |\n| Codex          | `codex app-server` (live, turn-free) → last cached payload → local session JSONL |\n| Claude Code    | OAuth usage API (auto-refreshing the token) → API cache → statusline cache → project JSONL |\n| GitHub Copilot | Background PTY footer capture → cached capture                              |\n| Kilo / MiniMax / OpenCode | `kilo stats` / `mmx quota show` / `opencode stats` → environment variables |\n\nA provider only reports `stale-usage` if it cannot be refreshed for a **known\nauthentication or CLI-startup reason** - e.g. Codex shows `not-authenticated`\n(no `~/.codex/auth.json` credentials) or `missing-cli` (the `codex` binary is\nnot on `PATH`). When the CLI is installed and signed in, you always see live\ndata.\n\nThe local-snapshot fallbacks (Codex/Claude session files) are still treated as\nstale after `LLM_USAGE_LOCAL_SNAPSHOT_MAX_AGE` seconds (default `60`, capped at\n60) while they claim an active window, so a brief network blip degrades to\n\"unavailable\" rather than to a misleadingly old percentage.\n\n### Local Service\n\nBy default, a one-shot `llm-usage` client first asks the local Unix-socket\nservice for a snapshot. If no continuous service is running, it starts the same\nsampler ephemerally, reads one snapshot, asks it to shut down, and falls back to\ndirect provider reads if the service cannot start. This keeps the default user\nexperience as a single CLI while using the same protocol future clients can use.\n\nFor instant reports and continuous burn-down history, install the sampler as a\nnative user service:\n\n```bash\nllm-usage --service-install\nllm-usage --service-status\nllm-usage --service-uninstall\n```\n\nOn Linux this writes and enables a `systemd --user` unit. On macOS it writes and\nloads a launchd LaunchAgent. The service is local-only: it listens on a\nUnix-domain socket under `${XDG_RUNTIME_DIR:-/tmp/llm-tools-$UID}`, writes\n`latest.json` under the `llm-usage` cache directory, and appends samples to\n`history.jsonl`. It has no HTTP listener, database, telemetry, or extra runtime\ndependency. Use `--no-service` for an explicit direct read, or `--service-run`\nto run the foreground process under a custom supervisor.\n\nWhile readers are in flight, `llm-usage` shows a small spinner that erases\nitself once the table is ready. In `--watch` mode it docks to the right of the\nclock in the header (`LLM Usage · 14:29  ⠙ refreshing usage 3/6`) and the frame\nredraws in place - no full-screen wipe, so the dashboard updates without a\nflash. It uses only the most portable cursor sequences (`ESC[H`, `ESC[K`,\n`ESC[r;cH`, and `ESC 7`/`ESC 8` save-restore), so it renders correctly under\ntmux, GNU screen, and a raw telnet PTY. Outside `--watch`, the spinner sits on\n`stderr` and self-erases. It is shown only on an interactive terminal, so pipes,\n`--json` consumers, and batch scripts stay byte-clean. Disable it with\n`LLM_USAGE_NO_PROGRESS=1`.\n\n### GitHub Copilot Notes\n\nThe Copilot footer only shows plan/session usage when the `quota` and `ai-used` status-line items are enabled. These are off by default on a fresh install and are normally toggled via `/statusline`.\n\nBefore each capture, `llm-usage` enables those items in:\n\n```text\n${COPILOT_HOME:-~/.copilot}/settings.json\n```\n\nAll other settings are preserved.\n\nSet this variable to skip the settings write:\n\n```bash\nLLM_USAGE_COPILOT_NO_SETTINGS_WRITE=1\n```\n\nCopilot capture is cached with `LLM_USAGE_COPILOT_CACHE_TTL`. The PTY footer\ncapture is slow and occasionally flaky, so when a refresh cannot complete within\nits wait budget, `llm-usage` shows the most recent monthly figure (\"usage on\nstart\") and refreshes it in the background for the next run, rather than dropping\nto `unavailable`.\n\nDefault:\n\n```bash\nLLM_USAGE_COPILOT_CACHE_TTL=300\n```\n\nForce synchronous capture:\n\n```bash\nLLM_USAGE_COPILOT_CACHE_TTL=0\n```\n\n### Copilot monthly allowance override\n\nThe GitHub `premium_request/usage` fallback computes the monthly quota bar\nfrom the user's plan. Override the plan or pin the allowance explicitly:\n\n```bash\n# Resolve as Pro+ (1500 requests / month) instead of the default Pro\nLLM_USAGE_COPILOT_PLAN=pro_plus\n\n# Or pin a custom allowance for an enterprise / custom contract\nLLM_USAGE_COPILOT_MONTHLY_ALLOWANCE=5000\n```\n\nThe cached `premium_request/usage` figure is reused across runs (default\nTTL 600s):\n\n```bash\nLLM_USAGE_COPILOT_MONTHLY_TTL=600\n```\n\nDisable only the monthly GitHub fallback, without hiding the add-on spend row:\n\n```bash\nLLM_USAGE_DISABLE_COPILOT_MONTHLY=1\n```\n\nTests can pin a frozen \"current month\" so the reader takes the cheap\nyear+month path instead of fanning out 30 day-level requests:\n\n```bash\nLLM_USAGE_COPILOT_PREMIUM_REQUEST_MONTH_OVERRIDE=2026-05\n```\n\n## Wake Support\n\n`llm-scheduler --wake` is best effort.\n\nIt prefers a transient user `systemd-run` timer with `WakeSystem=true`. If that is unavailable, it logs an `rtcwake` fallback command.\n\n`llm-scheduler --suspend-until-ready` schedules a resumed scheduler invocation, then calls:\n\n```bash\nsystemctl suspend\n```\n\nafter the timer is accepted.\n\nConfigure the pre-suspend confirmation pause:\n\n```bash\nLLM_SCHEDULER_PRE_SUSPEND_CONFIRMATION_SECONDS=10\n```\n\nWake reliability depends on:\n\n* firmware and BIOS/UEFI settings\n* motherboard RTC support\n* kernel support\n* systemd user timers\n* power state\n\nThe tool does not modify BIOS/UEFI settings and does not silently require `sudo`.\n\nRun diagnostics with:\n\n```bash\nllm-scheduler --wake-test\n```\n\n## Appearance and Output Customization\n\nColor override example:\n\n```bash\nLLM_TOOLS_COLOR_ERROR='1;34'\n```\n\nSupported color roles:\n\n```text\nBRAND, INFO, OK, WARN, ERROR, DIM, DIFF_ADD, DIFF_REMOVE, DIFF_HUNK,\nCOMMAND, TOOL, STDERR, HEADING\n```\n\nSymbol override examples:\n\n```bash\nLLM_TOOLS_SYMBOL_ERROR=!\nLLM_TOOLS_NO_SYMBOLS=1\n```\n\nRalph-launched provider processes inherit:\n\n```text\nLLM_TOOLS_RALPH_ROBIN_ACTIVE=1\nLLM_TOOLS_RALPH_ROBIN_SELECTED_PROVIDER\nLLM_TOOLS_RALPH_ROBIN_PROVIDERS\n```\n\nIf a child process tries to run `llm-scheduler --suspend-until-ready` while Ralph is active, the scheduler exits with status `75` instead of suspending. Ralph remains the single rotation and suspend coordinator.\n\n## Requirements\n\n* Linux or macOS.\n* Python 3.11 or newer.\n* Optional: `tmux` for tmux mode.\n* Optional: `systemd-run` or `rtcwake` for wake support.\n\nWake and suspend features require Linux with systemd:\n\n```text\n--wake\n--suspend-until-ready\n```\n\nEverything else works on Linux and macOS.\n\n## Limitations\n\n* Uses local data and locally authenticated CLIs only.\n* Not an official billing dashboard.\n* Missing or inconclusive provider data is shown as `-`, `unknown`, or `unavailable`.\n* If usage remains unavailable beyond `--max-unavailable-wait`, the scheduler launches optimistically and lets provider rate-limit handling and retry behavior take over.\n* Provider local data formats and CLI syntax can change.\n* Copilot AI credits are parsed when requested, but scheduler gating currently uses monthly remaining usage.\n\n## Adding a Provider\n\n`llm-tools` uses a small provider-adapter contract. To add a new CLI, for example `acme-cli`:\n\n1. Add a provider module:\n\n   ```text\n   llm_tools/providers/acme.py\n   ```\n\n   It should expose:\n\n   ```text\n   read(env) -\u003e ProviderSnapshot\n   ```\n\n   The snapshot carries zero or more `CapacityScope` objects.\n\n2. Use one of the supported capacity scope kinds:\n\n   ```text\n   reset_window\n   balance\n   budget\n   ungated\n   unknown\n   ```\n\n3. Re-export the module from:\n\n   ```text\n   llm_tools/providers/__init__.py\n   ```\n\n4. Register supported scopes in:\n\n   ```text\n   llm_tools/capacity.PROVIDER_SCOPES\n   ```\n\n   Example:\n\n   ```python\n   PROVIDER_SCOPES[\"acme\"] = {SCOPE_AUTO, ...}\n   ```\n\n5. Add default launch commands under:\n\n   ```text\n   llm_tools.scheduler.provider_default_argv\n   ```\n\n   Add both attached and headless paths where applicable.\n\n6. Optionally add a highlighting pattern in:\n\n   ```text\n   scheduler.highlight_provider_text\n   ```\n\n7. Add `--provider` and `--providers` membership in the relevant validators:\n\n   ```text\n   scheduler.py\n   ralph_robin.py\n   ```\n\nAfter that, the generic decision logic in `llm_tools/capacity.decide` handles the provider's scopes. `llm-usage`, `llm-scheduler`, and `ralph-robin` can then use the new provider through the same model as the existing ones.\n\n## Tests\n\nInstall test dependencies:\n\n```bash\npython -m pip install -e . pytest coverage\n```\n\nRun tests with coverage:\n\n```bash\ncoverage run -m pytest\ncoverage combine\ncoverage report --fail-under=85\n```\n\nTests use fixtures and mock commands. They do not require:\n\n* real Codex, Claude, Copilot, Kilo, or MiniMax installations\n* provider credentials\n* network access\n* the user's real home directory\n\nFor manual end-to-end checks, run the examples above against installed and authenticated providers without the test fixture environment.\n\n## Contributing\n\nSmall, focused pull requests are welcome.\n\nBefore opening a PR, make sure total coverage is at or above `85%`:\n\n```bash\npython -m pytest -q\ncoverage run -m pytest\ncoverage combine\ncoverage report --fail-under=85\n```\n\n## License\n\nApache License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisgleissner%2Fllm-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchrisgleissner%2Fllm-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisgleissner%2Fllm-tools/lists"}