{"id":50758982,"url":"https://github.com/asakur44/mars","last_synced_at":"2026-06-11T08:01:53.912Z","repository":{"id":350656516,"uuid":"1207583229","full_name":"asakur44/mars","owner":"asakur44","description":"MARS — Model Adapter \u0026 Routing System. MCP-based router for multi-model LLM dispatch with sessions and budgets. Part of the Fr4ym + MARS + BCKS stack.","archived":false,"fork":false,"pushed_at":"2026-05-23T05:43:52.000Z","size":179,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-23T07:28:17.586Z","etag":null,"topics":["claude-code","codex","deepseek","fr4ym","gemini","grok","llm","mars","mcp","router","subagent","xai","zai"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asakur44.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-11T05:49:29.000Z","updated_at":"2026-05-23T05:42:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/asakur44/mars","commit_stats":null,"previous_names":["asakur44/modelmesh","asakur44/mars"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/asakur44/mars","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asakur44%2Fmars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asakur44%2Fmars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asakur44%2Fmars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asakur44%2Fmars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asakur44","download_url":"https://codeload.github.com/asakur44/mars/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asakur44%2Fmars/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34188272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude-code","codex","deepseek","fr4ym","gemini","grok","llm","mars","mcp","router","subagent","xai","zai"],"created_at":"2026-06-11T08:01:50.917Z","updated_at":"2026-06-11T08:01:53.904Z","avatar_url":"https://github.com/asakur44.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MARS\n\n**Model Adapter Routing System.** An MCP server that lets Claude Code (or any MCP client) delegate work to other LLMs as subagents — preserving conversation continuity across turns via stable session IDs. Part of the Fr4ym + MARS + BCKS stack.\n\n\u003e **Renamed from ModelMesh on 2026-05-04.** The legacy `modelmesh` console command, `MODELMESH_*` env vars, and `~/.modelmesh/` storage path continue to work with a `DeprecationWarning` through MARS v0.2.0. See [CHANGELOG.md](./CHANGELOG.md) for the full migration path.\n\nWraps six backends:\n\n| Tool | Backend | Auth |\n|---|---|---|\n| `ask_codex` | Local [`codex`](https://developers.openai.com/codex/cli) CLI (full agent loop) | `codex login` |\n| `ask_gemini` | Local [`gemini`](https://geminicli.com) CLI (full agent loop) | `gemini auth` |\n| `ask_openrouter` | OpenRouter HTTPS API (any model) | `OPENROUTER_API_KEY` env |\n| `ask_deepseek` | DeepSeek HTTPS API | `DEEPSEEK_API_KEY` env |\n| `ask_grok` | xAI Grok HTTPS API | `XAI_API_KEY` env |\n| `ask_zai` | Z.AI (Zhipu) HTTPS API — JWT auth handled internally | `ZAI_API_KEY` env (legacy `id.secret` shape) |\n| `ask_mimo` | Xiaomi MiMo HTTPS API (Singapore plan) | `MIMO_API_KEY` env |\n| `ask_kimi` | Kimi / Moonshot AI HTTPS API (`api.moonshot.ai`; `kimi-for-coding` → Kimi Code subscription) | `KIMI_API_KEY` env (`KIMI_CODE_API_KEY` for the coding route) |\n\nPlus admin tools: `list_api_sessions`, `delete_api_session`.\n\n---\n\n## Why this exists\n\nClaude Code is great, but sometimes you want a second opinion from GPT-5.5, or to delegate a long task to Gemini's 1M-context model, or to run cheap reasoning on DeepSeek-V4-Flash, or to triangulate against Z.AI's GLM-5.1. Doing this by hand — switching tools, copy-pasting context, losing thread — is friction.\n\nMARS turns those LLMs into first-class tools Claude can call mid-conversation. Each call returns a `session_id` you can pass back on the next call to keep the conversation going. Codex/Gemini sessions persist in their own native stores; DeepSeek/OpenRouter/Grok/Z.AI conversations are kept in a local JSON store and replayed on each call.\n\n---\n\n## Install\n\n### Prerequisites\n\n- Python 3.10+ (PyJWT is now a dependency — `pip install` handles it; `uv run` reads it from PEP 723 metadata)\n- For `ask_codex`: install and authenticate [Codex CLI](https://developers.openai.com/codex/cli) (`npm install -g @openai/codex` then `codex login`)\n- For `ask_gemini`: install and authenticate [Gemini CLI](https://geminicli.com) (`npm install -g @google/gemini-cli` then `gemini auth`)\n- For `ask_openrouter`: an [OpenRouter API key](https://openrouter.ai/settings/keys)\n- For `ask_deepseek`: a [DeepSeek API key](https://platform.deepseek.com/api_keys)\n- For `ask_grok`: an [xAI API key](https://console.x.ai/)\n- For `ask_zai`: a [Z.AI API key](https://platform.kimi.ai/) (legacy `id.secret` format — the tool generates the required JWT internally; do NOT pre-sign)\n\n### Install the server\n\n```bash\ngit clone https://github.com/asakur44/mars.git\ncd mars\npip install .\n```\n\nThis puts a `mars` console command on your PATH. (The legacy `modelmesh` console command is also installed for backwards compatibility through MARS v0.2.0.)\n\n(Alternative: if you have [`uv`](https://docs.astral.sh/uv/) installed, you can skip `pip install` and use `uv run server.py` — the inline PEP 723 metadata handles deps including `pyjwt\u003e=2.0.0` for `ask_zai`.)\n\n### Register with Claude Code\n\n```bash\nclaude mcp add --scope user mars \\\n  --env OPENROUTER_API_KEY=sk-or-... \\\n  --env DEEPSEEK_API_KEY=sk-... \\\n  --env XAI_API_KEY=xai-... \\\n  --env ZAI_API_KEY=\u003ckey_id\u003e.\u003csecret\u003e \\\n  -- mars\n```\n\n`--scope user` makes it available across all your projects. Drop `--env` flags for any provider you don't have a key for; the corresponding tools will return a clean error when called instead of crashing the server.\n\nIn Claude Code, run `/mcp` to confirm `mars` is connected. Eight tools should now be callable (six chat subagents + two admin tools).\n\n### Register with other MCP clients\n\nCursor, Windsurf, VS Code MCP, etc. all accept stdio MCP servers. Point them at the `mars` command with the same env vars.\n\n---\n\n## Usage\n\n### Basic call\n\n```python\nask_openrouter(prompt=\"explain memoization briefly\")\n# → {\"output\": \"...\", \"session_id\": \"a1b2c3d4-...\"}\n```\n\n### Multi-turn continuity\n\nThe whole point. Capture `session_id`, pass it back:\n\n```python\nr1 = ask_deepseek(prompt=\"propose a schema for a vehicle inspection finding\")\nsid = r1[\"session_id\"]\n\nr2 = ask_deepseek(prompt=\"now critique it\", session_id=sid)   # remembers turn 1\nr3 = ask_deepseek(prompt=\"rewrite for postgres\", session_id=sid)  # remembers 1 \u0026 2\n```\n\nFor Codex/Gemini (agentic CLIs):\n\n```python\nr1 = ask_codex(prompt=\"implement parser in src/parser.py\", cwd=\"/path/to/project\")\nsid = r1[\"session_id\"]   # codex thread UUID\n\n# resume later — even after other codex calls happened in between\nask_codex(prompt=\"now write tests for it\", session_id=sid)\n```\n\n### Choosing a model\n\nAll six chat subagents have explicit string defaults — no \"depends on local CLI\" footnotes. Override per call when you want something else.\n\n`ask_openrouter` defaults to `moonshotai/kimi-k2.6` (Moonshot AI Kimi 2.6, 256K context, thinking-mode). Override:\n\n```python\nask_openrouter(prompt=\"...\", model=\"deepseek/deepseek-v4-pro\")    # 1M ctx, ~5× cheaper on output\nask_openrouter(prompt=\"...\", model=\"anthropic/claude-sonnet-4.6\")\nask_openrouter(prompt=\"...\", model=\"openai/gpt-5.5\")\nask_openrouter(prompt=\"...\", model=\"x-ai/grok-4.20-reasoning\")\n```\n\n`ask_deepseek` defaults to `deepseek-v4-pro` (V4 advanced reasoning, thinking-mode). Drop to V4-Flash for high-volume / cost-sensitive work:\n\n```python\nask_deepseek(prompt=\"...\", model=\"deepseek-v4-flash\")\n```\n\nLegacy aliases `deepseek-chat` / `deepseek-reasoner` route to V4-Flash non-thinking / thinking-mode respectively; both deprecated 2026-07-24.\n\n`ask_grok` defaults to `grok-4.20-reasoning` (xAI flagship reasoning). Other choices:\n\n```python\nask_grok(prompt=\"...\", model=\"grok-4-1-fast-reasoning\")  # 10× cheaper, 2M ctx, high-volume\nask_grok(prompt=\"...\", model=\"grok-code-fast-1\")          # agentic coding (256K)\n```\n\n`ask_gemini` defaults to `gemini-3.1-pro-preview` (preview tier — Google rotates `-preview` ids on roughly quarterly cadence; revisit when it gets promoted). Override to a stable tier when you don't want preview-rotation risk:\n\n```python\nask_gemini(prompt=\"...\", model=\"gemini-2.5-pro\")\n```\n\n`ask_zai` defaults to `glm-5.1` (Zhipu AI flagship; thinking-mode with separate `reasoning_content` stream like `deepseek-reasoner`). Other GLM variants:\n\n```python\nask_zai(prompt=\"...\", model=\"glm-5-turbo\")    # faster, lower latency\nask_zai(prompt=\"...\", model=\"glm-4.7\")         # prior generation\n```\n\n`ask_codex` defaults to `gpt-5.5` (OpenAI flagship). Override:\n\n```python\nask_codex(prompt=\"...\", model=\"gpt-5\")\nask_codex(prompt=\"...\", model=\"o3\")\n```\n\n---\n\n## Telling Claude to use session IDs\n\nClaude Code won't automatically capture session IDs unless you teach it to. Add this to your global `~/.claude/CLAUDE.md` (creates if missing):\n\n```markdown\n## mars — session_id continuity\n\nThe MCP server `mars` exposes `ask_codex`, `ask_gemini`,\n`ask_deepseek`, `ask_openrouter`, `ask_grok`, `ask_zai`. Each returns\n`{\"output\": str, \"session_id\": str | None}`.\n\n**Rule:** Treat the returned `session_id` as load-bearing.\n- Capture it from every subagent call.\n- On any follow-up call continuing the same task, pass it back as\n  the `session_id` argument.\n- Only omit it (start fresh) when the work is genuinely unrelated.\n- Surface the active `session_id` in your visible response the first\n  time you receive it, so it remains addressable later.\n```\n\nThis loads in every Claude Code session in every project.\n\n### (optional) Orchestration discipline for sub-agent loops\n\nIf you use Claude (or any other agent) to spawn sub-agents that themselves call MARS — parallel fan-out, structured-data extraction, multi-vendor critique passes — add this second snippet alongside the session-id one. It encodes the failure modes that MARS can't engineer around (provider-side gateway limits, parent-agent dispatch hygiene, late-write recovery):\n\n```markdown\n## mars — orchestration discipline\n\nWhen spawning sub-agents that call MARS, apply these to avoid\nsilent failures:\n\n**Pick the model for the shape of work:**\n- Bulk fan-out (single call \u003e10K output): `ask_grok` —\n  `grok-4.20-reasoning` empirically holds long single-call output;\n  xAI is the only gateway that reliably delivers it.\n- Per-section structured outputs (small per-table calls):\n  `ask_zai` / `ask_deepseek` / `ask_gemini` are fine. For GLM-5.1\n  specifically, fragment large work into 5–10 per-table calls —\n  Z.AI gateway truncates or 504s on bulk requests \u003e~16K output.\n- Heavy agentic work in a repo: `ask_codex` (MARS handles the\n  disk-brief workaround for long structured prompts automatically).\n\n**Dispatch hygiene:**\n- **Never describe planned tool calls in prose.** Emit `tool_use`\n  blocks directly. If you find yourself writing \"dispatching X\" or\n  \"calling Y\", stop and emit the actual calls.\n- After spawning a child agent that should make N calls, count\n  `tool_use` blocks in its transcript. If \u003cN, the agent silently\n  no-op'd — re-prompt explicitly with the count assertion.\n- For parallel fan-out, 1–2 MARS calls per child agent.\n  Multiple sequential calls within one child compound watchdog risk\n  (heartbeats every 30s help, but the cumulative window adds up).\n\n**When a child appears killed mid-call:**\n- Check the expected output file 5–10 minutes after the kill before\n  discarding the work. Thinking-mode calls (V4-Pro, GLM-5.1,\n  Grok-4.20-reasoning) often complete and write to disk *after* the\n  parent watchdog timestamp. Re-read the file and recover the\n  artifact rather than retrying.\n```\n\n---\n\n## Configuration\n\nAll optional. Set in the `--env` flags when registering, or in your shell environment.\n\n| Env var | Purpose | Default |\n|---|---|---|\n| `DEEPSEEK_API_KEY` | DeepSeek API key | required for `ask_deepseek` |\n| `OPENROUTER_API_KEY` | OpenRouter API key | required for `ask_openrouter` |\n| `XAI_API_KEY` | xAI Grok API key | required for `ask_grok` |\n| `ZAI_API_KEY` | Z.AI (Zhipu) API key in legacy `id.secret` format — tool generates JWT internally | required for `ask_zai` |\n| `MIMO_API_KEY` | Xiaomi MiMo API key (Singapore plan) | required for `ask_mimo` |\n| `KIMI_API_KEY` | Kimi / Moonshot Open Platform API key (`api.moonshot.ai`) | required for `ask_kimi` |\n| `KIMI_CODE_API_KEY` | Kimi Code Console key — only for `ask_kimi(model=\"kimi-for-coding\")` | optional |\n| `MARS_DIR` | Where to store API session files | `~/.mars/` |\n| `MARS_HEARTBEAT_INTERVAL_SEC` | Progress-heartbeat interval (seconds) | `30` |\n| `OPENROUTER_REFERER` | `HTTP-Referer` header sent to OpenRouter (analytics attribution) | omitted |\n| `OPENROUTER_TITLE` | `X-Title` header sent to OpenRouter | omitted |\n\n**Legacy env vars** (deprecated, removed in MARS v0.2.0): `MODELMESH_DIR` and `MODELMESH_HEARTBEAT_INTERVAL_SEC` are still read if the new names are unset, with a `DeprecationWarning`. Existing storage at `~/.modelmesh/` is also auto-detected and used as a fallback if `~/.mars/` doesn't exist yet — same warning. Migrate with `mv ~/.modelmesh ~/.mars` to silence.\n\n---\n\n## Tool reference\n\n### `ask_codex(prompt, model?, cwd?, sandbox?, timeout_sec?, session_id?)`\n\nRuns the Codex CLI's full agent loop (read files, edit, run shell commands) inside a sandbox.\n\n- `model`: default `\"gpt-5.5\"`. Override to `\"gpt-5\"`, `\"o3\"`, or any model id your Codex CLI auth has access to.\n- `sandbox`: `\"read-only\"` | `\"workspace-write\"` (default) | `\"danger-full-access\"`\n- `cwd`: working directory for Codex (default: server's CWD)\n- `session_id`: pass `None` for fresh, `\"last\"` for most recent, or any UUID/thread name to resume that exact session\n\n### `ask_gemini(prompt, model?, cwd?, approval_mode?, timeout_sec?, session_id?)`\n\nRuns the Gemini CLI as an agent.\n\n- `model`: default `\"gemini-3.1-pro-preview\"`. Preview tier — Google rotates `-preview` ids quarterly; revisit when it promotes. Override to `\"gemini-2.5-pro\"` for stable.\n- `approval_mode`: `\"yolo\"` (default) | `\"auto_edit\"` | `\"plan\"` (read-only) — must NOT be `\"default\"` (would block on prompts)\n- `session_id`: pass `None` for fresh, `\"last\"` for most recent, or a hex id previously returned by this tool\n\nNote: Gemini's CLI resumes by mtime-ordered index, not by stable id. The server resolves your hex id to the current index by scanning `~/.gemini/tmp/\u003cuser\u003e/chats/`. IDs are stable as long as the chat file isn't deleted.\n\n### `ask_openrouter(prompt, model?, system?, max_tokens?, session_id?)`\n\nStateless API calls + local session replay.\n\n- `model`: default `\"moonshotai/kimi-k2.6\"` (Moonshot AI Kimi 2.6; 256K context; thinking-mode). Pass any [OpenRouter model id](https://openrouter.ai/models) to override.\n- `max_tokens`: default `100000` — effectively no-cap; OpenRouter clamps to per-model `max_completion_tokens` server-side. The cap is a ceiling, not a charge.\n- `system`: optional system prompt (used only on fresh sessions)\n- `session_id`: pass `None` for fresh, or a UUID from a previous call\n\nHistory is replayed each call; oldest pairs are trimmed when context approaches the model's window.\n\n### `ask_deepseek(prompt, model?, system?, max_tokens?, session_id?)`\n\n- `model`: default `\"deepseek-v4-pro\"` (V4 advanced reasoning, thinking-mode; $0.435/$0.87 per M tokens with 75% discount through 2026-05-05, then ~$1.74/$3.48). Drop to `\"deepseek-v4-flash\"` ($0.14/$0.28 per M, ~3× cheaper) for high-volume work. Legacy `\"deepseek-chat\"` / `\"deepseek-reasoner\"` route to V4-Flash non-thinking / thinking-mode; deprecated 2026-07-24.\n- `max_tokens`: default `100000`. V4-Pro is thinking-mode and consumes tokens on internal reasoning before producing visible output; budget generously.\n- For thinking-mode (V4-Pro and `deepseek-reasoner`): `reasoning_content` (CoT) is intentionally NOT stored, per DeepSeek's guidance — only the final assistant message goes into history.\n\n### `ask_grok(prompt, model?, system?, max_tokens?, session_id?)`\n\n- `model`: default `\"grok-4.20-reasoning\"` (xAI flagship reasoning; $2/$6 per M tokens). Drop to `\"grok-4-1-fast-reasoning\"` ($0.20/$0.50 per M, 2M ctx) for high-volume work. Other ids: `\"grok-code-fast-1\"` (256K coding-tuned), `\"grok-4\"` / `\"grok-4-0709\"` (older 256K).\n- `max_tokens`: default `100000`.\n- `session_id`: same shape as the other API tools — None for fresh, a UUID from a previous call to resume.\n\n### `ask_zai(prompt, model?, system?, max_tokens?, session_id?)`\n\nZ.AI (Zhipu AI) GLM API. Default model `glm-5.1` (flagship; thinking-mode with separate `reasoning_content` stream like `deepseek-reasoner`).\n\n- `model`: default `\"glm-5.1\"`. Other ids: `\"glm-5\"`, `\"glm-5-turbo\"` (faster), `\"glm-5v-turbo\"` (multimodal vision), `\"glm-4.7\"` / `\"glm-4.7-flash\"`, `\"glm-4.6\"`, `\"glm-4.5\"`.\n- `max_tokens`: default `100000`. GLM-5.1 is thinking-mode; budget generously (the model consumes ~70+ reasoning tokens even on trivial prompts).\n- **Auth note**: `ZAI_API_KEY` must be the legacy Zhipu `id.secret` format (32-char hex + `.` + alphanum secret). The tool generates an HS256-signed JWT per call internally before sending; raw-Bearer auth with the unsigned key fails on `paas/v4` with `\"token expired or incorrect\"` despite some docs implying it works. The official `z-ai-sdk-python` does this signing transparently; we do it explicitly.\n\n### `list_api_sessions(provider?)`\n\nList stored DeepSeek / OpenRouter / Grok / Z.AI sessions, newest first. Filter by `\"deepseek\"`, `\"openrouter\"`, `\"grok\"`, or `\"zai\"`.\n\n### `delete_api_session(session_id)`\n\nDrop a stored session.\n\n---\n\n## How session continuity works\n\n| Provider | Session storage | ID stability |\n|---|---|---|\n| Codex | Codex's own SQLite + rollout JSONL in `~/.codex/` | Stable UUIDs. Pass back to resume. |\n| Gemini | Gemini's chat files in `~/.gemini/tmp/\u003cuser\u003e/chats/` | Hex suffix from filename. Stable as long as file exists. |\n| DeepSeek | `$MARS_DIR/api-sessions/\u003cuuid\u003e.json` | UUID we generate. Atomic JSON writes. |\n| OpenRouter | Same as DeepSeek | Same. |\n| Grok | Same as DeepSeek | Same. |\n| Z.AI | Same as DeepSeek | Same. JWT is regenerated per call (1-hour exp); session_id only tracks message history. |\n\nFor DeepSeek/OpenRouter/Grok/Z.AI, full message history is replayed on every call (the API itself is stateless). When estimated tokens approach the model's context window, oldest user/assistant pairs are dropped (system messages preserved).\n\n---\n\n## Calling from a sub-agent loop\n\nSpawning a child agent (Claude sub-agent, CI runner, automated orchestrator) that itself calls MARS introduces a supervisor-vs-thinking-model timing trap: parent agents kill children that go silent for ~600s, but thinking-mode reasoning models (DeepSeek V4-Pro, Grok 4.20-reasoning, Kimi K2.6, GLM-5.1, Gemini 3.1 Pro Preview) routinely take 5–15 minutes per call. The parent sees silence, kills the child, and the underlying call eventually completes anyway — to no one.\n\nMARS handles this with three engineered fixes shipped in v0.1.2 (commits [`e1301f0`](https://github.com/asakur44/mars/commit/e1301f0) and [`794ca01`](https://github.com/asakur44/mars/commit/794ca01)):\n\n### Progress heartbeats (automatic when called via MCP)\n\nEvery chat tool now accepts `ctx: Optional[Context] = None`. FastMCP injects `Context` automatically when the tool is invoked over MCP; while the slow API/CLI call awaits, a background task emits MCP progress notifications every 30s with messages like `\"deepseek/deepseek-v4-pro: thinking... (60s elapsed)\"`. Parent watchdogs that count progress notifications as liveness see ~20 pings over a 10-minute call instead of one silent block.\n\nTunable via `MARS_HEARTBEAT_INTERVAL_SEC` env var (default `30`). No-op when called from non-MCP code paths (no `ctx` available).\n\n### Auto disk-brief for Codex (closes ~75% rejection on long structured prompts)\n\nCodex CLI fresh sessions empirically reject ~5KB structured prompts ~75% of the time — Codex returns \"send the skeleton you want filled in\", returns `[]`, or hallucinates a different schema. The pattern that empirically unblocks Codex is: write the brief to a file, send Codex a one-liner \"read FILE and execute\". `ask_codex` now does this automatically:\n\n- Triggers when `len(prompt) \u003e CODEX_BRIEF_THRESHOLD` (default `3000`; env-tunable) AND `session_id is None`\n- Writes the prompt to `\u003ctempdir\u003e/mars-codex-briefs/brief-\u003cuuid\u003e.md`\n- Replaces the prompt with: *\"Read PATH and execute. Emit outputs INLINE (do not write files; caller will persist).\"*\n- Cleans up the brief file after Codex returns (best-effort)\n\nThe inline-emit instruction also closes a related Codex sandbox quirk: even at `sandbox=danger-full-access`, in-Codex Write calls aren't reliably permitted by the runtime. Inline-emit avoids the issue entirely; the caller writes the output to disk after `ask_codex` returns.\n\n### Inner timeout bumped from 180s → 900s\n\nHeartbeats keep the parent watchdog alive but don't help if the inner httpx call itself times out at 3 minutes before the model finishes. `_openai_compatible_chat`'s default `timeout_sec` is now `900` (15 min) — fits the typical 5–15 min thinking-mode envelope.\n\n### Per-model output budget\n\nBeyond watchdog and timeout, each provider has its own *output ceiling* — a practical limit on visible tokens per single call beyond which the gateway truncates, returns 504, or silently drops content. These ceilings shift with provider load and aren't always stable, so MARS doesn't enforce them; instead they live as discoverable hints in `_MODEL_PRACTICAL_OUTPUT_CEILING` (in `server.py`).\n\nEmpirically observed:\n\n| Model | Practical output ceiling | Bulk fan-out (single call \u003e10K output) |\n|---|---|---|\n| `grok-4.20-reasoning` (and the 4.20 family) | ~60K | **✓ Holds.** xAI gateway is the only one that reliably delivers long single-call output. |\n| `grok-4-1-fast-reasoning` | ~60K | ✓ Holds. |\n| `deepseek-v4-flash` | ~64K | ✓ Generally holds (non-thinking). |\n| `deepseek-v4-pro` | ~32K | △ Thinking-mode reserves significant budget for internal reasoning; bulk above ~32K can degrade. |\n| `moonshotai/kimi-k2.6` | ~32K | △ Treat conservatively — bulk-fanout limit unverified beyond small calls. |\n| `gemini-3.1-pro-preview` | ~32K | △ Treat conservatively pending evidence. |\n| `glm-5.1` (and the GLM family) | **~16K** | **✗ Fails.** Z.AI gateway truncates / 504s on bulk-output requests. Empirical pattern that works on GLM: per-table fragmentation (5–10 per-section calls of ~4K output each), assembled client-side. |\n\n**The verdict that follows from this:** for bulk fan-out work — single call producing \u003e10K output — route through `ask_grok` with `grok-4.20-reasoning`. For per-section work where each call produces ~4K output and the caller assembles the result, `ask_zai` / `ask_deepseek` / `ask_gemini` are all fine. Don't try to drop GLM-5.1 in as a Grok substitute on bulk-fanout requests; the gateway-side ceiling will silently fail you.\n\n### Patterns that still require caller discipline\n\nEven with the engineered fixes, four orchestration patterns remain caller-side:\n\n- **Smaller agents.** For parallel fan-out, prefer 1–2 MARS calls per child agent. Multiple sequential calls within one child compound the watchdog risk.\n- **Inner timeout \u003c parent watchdog.** If the parent's watchdog is 600s, set `ask_*(timeout_sec=400)` so inner failures are cleanly recoverable.\n- **Main-thread fallback.** Anticipating a stall? Fire `ask_*` directly from the parent rather than through a child sub-agent. Costs context budget, eliminates supervisor-kill.\n- **Late-write recovery.** If a child gets killed mid-call, check the expected output file 5–10 minutes later before discarding the work as failed — the underlying call may have completed and written the file after the kill.\n\n---\n\n## Limitations / known issues\n\n- **No streaming.** Tools return final text only. Codex/Gemini agent loops can take minutes; you won't see partial output.\n- **No image input** for `ask_codex` / `ask_gemini` (CLIs support `-i`; not exposed yet).\n- **DeepSeek/OpenRouter/Grok/Z.AI token cost grows linearly per turn** — full history is resent each call. DeepSeek's context caching offsets repeat-prefix cost; OpenRouter pass-through depends on the underlying provider; xAI's caching policy varies by model; Z.AI doesn't currently surface a caching parameter on `paas/v4`.\n- **Thinking-mode models can hit `max_tokens` invisibly** if you set the cap too low — the model spends 2–6K tokens on internal reasoning before producing visible output, so `max_tokens=4096` will silently truncate real work. The `100000` default avoids this; budget at least 16K if you override.\n- **MCP progress notifications are best-effort.** Heartbeats emit via the standard MCP `notifications/progress` channel; clients that don't understand them silently ignore. There's an open issue ([modelcontextprotocol/python-sdk#953](https://github.com/modelcontextprotocol/python-sdk/issues/953)) on streamable-HTTP transport — if your MCP client uses streamable-HTTP, verify heartbeats reach the watchdog before relying on them. Stdio transport (Claude Code's default) is unaffected.\n- **Gemini IDs are not natively stable.** The CLI resumes by mtime-ordered index; we rebuild stability by scanning the chat dir. If the user clears history, IDs become invalid.\n- **Z.AI key format quirk:** keys are `id.secret` and require client-side JWT signing. The tool handles this internally; do NOT pre-sign or pass a JWT as `ZAI_API_KEY`.\n- **Windows + npm shim quirk:** the server resolves CLI paths via `shutil.which()` and explicitly closes stdin to `DEVNULL` to avoid documented hangs in `codex exec` and `gemini -p` when run as a subprocess.\n\n---\n\n## License\n\nMIT. See [LICENSE](./LICENSE).\n\n---\n\n## Contributing\n\nPRs welcome. The whole server is one file (`server.py`) plus `pyproject.toml` — keep it that way unless there's a strong reason to split.\n\nAreas where help would be useful:\n- Streaming output (would require an MCP shape change)\n- Image input for Codex/Gemini\n- More providers (Anthropic direct, local Ollama, Mistral, etc.)\n- Tests\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasakur44%2Fmars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasakur44%2Fmars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasakur44%2Fmars/lists"}