{"id":49311288,"url":"https://github.com/cyberofficial/nimbus","last_synced_at":"2026-06-13T05:00:36.904Z","repository":{"id":351693860,"uuid":"1210948649","full_name":"cyberofficial/NIMbus","owner":"cyberofficial","description":"A lightweight FastAPI proxy that translates Anthropic API requests to NVIDIA NIM's OpenAI-compatible endpoint.","archived":false,"fork":false,"pushed_at":"2026-06-12T06:30:59.000Z","size":270,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-12T07:09:42.401Z","etag":null,"topics":["anthropic","claude","claude-code","coding","fastapi","nim","nvidia","nvidia-nim","openapi","proxy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cyberofficial.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"cyberofficial","patreon":null,"open_collective":null,"ko_fi":"cyberofficial","tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"thanks_dev":null,"custom":null}},"created_at":"2026-04-14T23:25:13.000Z","updated_at":"2026-06-12T06:19:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cyberofficial/NIMbus","commit_stats":null,"previous_names":["cyberofficial/nimbus"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/cyberofficial/NIMbus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberofficial%2FNIMbus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberofficial%2FNIMbus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberofficial%2FNIMbus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberofficial%2FNIMbus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cyberofficial","download_url":"https://codeload.github.com/cyberofficial/NIMbus/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberofficial%2FNIMbus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34272603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","claude","claude-code","coding","fastapi","nim","nvidia","nvidia-nim","openapi","proxy"],"created_at":"2026-04-26T13:01:47.516Z","updated_at":"2026-06-13T05:00:36.898Z","avatar_url":"https://github.com/cyberofficial.png","language":"Python","funding_links":["https://github.com/sponsors/cyberofficial","https://ko-fi.com/cyberofficial"],"categories":[],"sub_categories":[],"readme":"# NIMbus\n\nA lightweight FastAPI proxy that routes Claude Code through NVIDIA NIM. Free, no Anthropic API key required.\n\n## Why NIMbus?\n\nClaude Code CLI and VSCode extension require an Anthropic API key. NIMbus acts as a translation layer:\n\n- **Free tier**: 40 requests per minute on NVIDIA NIM free tier\n- **No Anthropic key needed**: Use Claude Code with NVIDIA's free API\n- **Streaming support**: Full SSE streaming for real-time responses\n- **Thinking models**: Converts reasoning content to Claude format\n- **Lightweight**: Minimal dependencies, fast startup\n\n## Quick Start\n\n### Option 1: Standalone .exe (Windows, recommended)\n\nNo Python required. Download `nimbus.exe` from the [latest release](https://github.com/cyberofficial/NIMbus/releases).\n\n```cmd\n# 1. Run the exe — it auto-creates .env on first run\nnimbus.exe --init\n\n# 2. Follow the interactive wizard:\n#    - Enter your NVIDIA API key (tested live)\n#    - Choose your models and context window\n#    - Auto-configures Claude Code settings\n\n# 3. Start the proxy server\nnimbus.exe\n\n# 4. In another terminal, use Claude Code normally\nclaude\n```\n\nThe `--init` wizard handles everything:\n- Validates your NVIDIA API key against the live API\n- Auto-generates a proxy API key\n- Lets you pick models per Claude tier (Sonnet/Opus/Haiku) with context window selection\n- Backs up and updates `%USERPROFILE%\\.claude\\settings.json` automatically\n- Writes `.env` with all settings\n\nTo restore a backed-up settings.json: `nimbus.exe --init restore`\n\n### Option 2: Python (any OS)\n\n**Prerequisites:** NVIDIA NIM API key, Python 3.14.2+, [Claude Code](https://github.com/anthropics/claude-code)\n\n```bash\ngit clone https://github.com/cyberofficial/NIMbus.git\ncd NIMbus\ncp .env.example .env\n```\n\nEdit `.env`:\n\n```dotenv\nNVIDIA_NIM_API_KEY=\"nvapi-your-key-here\"\nMODEL=\"deepseek-ai/deepseek-v4-flash\"\n```\n\n### Running the Server\n\n**Using uv (recommended):**\n```bash\nuv run uvicorn server:app --host 0.0.0.0 --port 8082\n```\n\n**Using venv:**\n```bash\npython -m venv venv\nvenv\\Scripts\\activate   # Windows\nsource venv/bin/activate  # macOS/Linux\npip install -r requirements.txt\nuvicorn server:app --host 0.0.0.0 --port 8082\n```\n\n**Terminal 2 - Run Claude Code:**\n\n```bash\nANTHROPIC_AUTH_TOKEN=\"\u003creplaceme\u003e\" ANTHROPIC_BASE_URL=\"http://localhost:8082\" claude\n```\n\n## VSCode Extension\n\n1. Start the proxy server.\n2. Open VSCode Settings (`Ctrl + ,`), search for `claude-code.environmentVariables`.\n3. Click **Edit in settings.json** and add:\n\n```json\n\"claude-code.environmentVariables\": [\n  { \"name\": \"ANTHROPIC_BASE_URL\", \"value\": \"http://localhost:8082\" },\n  { \"name\": \"ANTHROPIC_AUTH_TOKEN\", \"value\": \"\u003creplaceme\u003e\" }\n]\n```\n\n4. Reload extensions.\n\n## Architecture\n\n```\n+------------------+      +----------------------+      +---------------+\n| Claude Code      | ---\u003e | NIMbus               | ---\u003e| NVIDIA NIM    |\n| CLI / VSCode     | \u003c--- | Proxy (:8082)        | \u003c---| API           |\n+------------------+      +----------------------+      +---------------+\n   Anthropic format        Translation layer         OpenAI-compatible\n   (SSE stream)                                      format (SSE stream)\n```\n\n**How it works:**\n\n1. Claude Code sends Anthropic-format API requests to the proxy\n2. Trivial requests (quota probes, title generation) are intercepted and answered locally\n3. Real requests are translated to OpenAI format and sent to NVIDIA NIM\n4. Responses are streamed back, converting thinking tags to Claude format\n\n## Available Models\n\nBrowse all: [build.nvidia.com/explore/discover](https://build.nvidia.com/explore/discover)\n\n## Configuration\n\n| Variable | Description | Default |\n| --- | --- | --- |\n| `MODEL` | Model identifier (`owner/model-name`, comma-separated for multi-model) | `deepseek-ai/deepseek-v4-flash` |\n| `NVIDIA_NIM_API_KEY` | NVIDIA API key | **required** |\n| `SERVER_TYPE` | Server mode: `stream` or `buffer` | `stream` |\n| `NIM_MAX_TOKENS` | Max tokens for responses | `202000` |\n| `NIM_THINKING` | Enable thinking/reasoning content | `true` |\n| `NIM_REASONING_EFFORT` | Reasoning effort: `low`, `medium`, or `high` | `high` |\n| `PROVIDER_RATE_LIMIT` | Requests per window | `40` |\n| `PROVIDER_RATE_WINDOW` | Rate window in seconds | `60` |\n| `PROVIDER_MAX_CONCURRENCY` | Max concurrent streams | `5` |\n| `PROVIDER_RETRY_ON_TRUNCATION` | Buffer mode retry count | `3` |\n| `PROVIDER_RETRY_DELAY` | Buffer mode retry base delay (s) | `1.0` |\n| `PROVIDER_MAX_WAIT_TIME` | Buffer mode max wait (s) | `30` |\n| `HTTP_READ_TIMEOUT` | Read timeout in seconds | `300` |\n| `HTTP_WRITE_TIMEOUT` | Write timeout in seconds | `10` |\n| `HTTP_CONNECT_TIMEOUT` | Connect timeout in seconds | `2` |\n| `PORT` | Server port | `8082` |\n| `PROXY_API_KEY` | Optional proxy authentication (auto-generated if empty) | (random) |\n\n### Stream vs Buffer Modes\n\nNIMbus has two server modes controlled by `SERVER_TYPE`. Both produce Anthropic-format responses compatible with Claude Code, but they trade off latency for reliability differently.\n\n#### Stream Mode (`SERVER_TYPE=stream` — default)\n\nTokens are relayed to Claude Code as NVIDIA generates them, just like a direct connection.\n\n- **Lowest latency** — Claude Code sees tokens immediately\n- **What happens during backend cutout**: The proxy sends a partial response with `stop_reason=\"max_tokens\"` and logs a warning. Claude Code receives whatever was generated before the interruption.\n- **No retry** — streaming cannot replay already-sent tokens, so a dropped connection means a partial response.\n- **Best for** interactive use where you want to see output as it's produced.\n\n```\nClaude Code ──── SSE stream ──── NIMbus ──── SSE stream ──── NVIDIA NIM\n              (live tokens)               (live tokens)\n```\n\nIf NVIDIA's backend cuts out mid-stream, the `SSEBuilder.truncated` flag is set and the final `message_delta` event carries `stop_reason: \"max_tokens\"`.\n\n#### Buffer Mode (`SERVER_TYPE=buffer`)\n\nThe proxy waits for NVIDIA to finish generating the **complete** response before sending anything to Claude Code. If the backend drops the connection, the proxy automatically retries.\n\n- **Higher latency** — Claude Code waits until the full response is ready\n- **Automatic retry with exponential backoff** on connection loss (`APIConnectionError`) and timeouts (`APITimeoutError`)\n- **Configurable retry behavior**:\n  | Setting | Default | What it does |\n  |---|---|---|\n  | `PROVIDER_RETRY_ON_TRUNCATION` | `3` | Number of retry attempts before giving up |\n  | `PROVIDER_RETRY_DELAY` | `1.0` | Base delay between retries (seconds) — multiplies by attempt number |\n  | `PROVIDER_MAX_WAIT_TIME` | `30` | Seconds to wait for NVIDIA before timing out and retrying |\n- **Retries count against the rate limit** to prevent exceeding your quota when the backend is unstable\n- If all retries are exhausted, raises `StreamTruncatedError` (mapped to an HTTP 500 error)\n- **Best for** long-generation tasks where losing the response is worse than waiting\n\n```\nClaude Code ──── JSON response ──── NIMbus ──── (wait + retry if needed) ──── NVIDIA NIM\n              (all at once)                   (accumulate complete response)\n```\n\n**Which should I choose?**\n\n| Scenario | Recommendation |\n|---|---|\n| Interactive coding / quick questions | `stream` (default) |\n| Batch processing / generating large files | `buffer` |\n| Spotty network or unstable backend | `buffer` |\n| Lowest latency matters most | `stream` |\n\n\u003e **Note:** NVIDIA's free tier occasionally drops connections mid-response. Stream mode will produce a partial answer; buffer mode will retry up to `PROVIDER_RETRY_ON_TRUNCATION` times to get a complete response.\n\n### Optimization Settings\n\nThese settings speed up Claude Code by mocking/skipping unnecessary requests:\n\n| Variable | Description | Default |\n| --- | --- | --- |\n| `FAST_PREFIX_DETECTION` | Fast command prefix detection | `true` |\n| `ENABLE_NETWORK_PROBE_MOCK` | Mock quota probe requests | `true` |\n| `ENABLE_TITLE_GENERATION_SKIP` | Skip title generation requests | `true` |\n| `ENABLE_SUGGESTION_MODE_SKIP` | Skip suggestion mode requests | `true` |\n| `ENABLE_FILEPATH_EXTRACTION_MOCK` | Mock filepath extraction | `true` |\n| `ENABLE_RECAP_SKIP` | Block recap requests (stepped away/return) | `true` |\n\nSee [`.env.example`](.env.example) for all options.\n\n## API Endpoints\n\n| Endpoint | Description |\n| --- | --- |\n| `GET /` | Root — returns provider info, model, and model list |\n| `POST /v1/messages` | Create a message (streaming) |\n| `POST /v1/messages/buffered` | Create a message (buffered, with retry) |\n| `POST /v1/messages/count_tokens` | Count tokens for a request |\n| `GET /health` | Health check |\n| `GET /status` | Server status |\n| `POST /stop` | Stop all CLI sessions and pending tasks |\n\n## Troubleshooting\n\n### Common Issues\n\n**Connection refused**\n- Ensure the proxy is running on the correct port\n- Check firewall settings\n\n**Rate limit exceeded**\n- NVIDIA NIM free tier: 40 requests/minute\n- Wait and retry, or reduce concurrent requests\n\n**Model not found**\n- Verify MODEL format: `owner/model-name`\n- Check available models at [build.nvidia.com](https://build.nvidia.com/explore/discover)\n\n### Logs\n\nLogs are written to the console. For verbose output, check the terminal where the proxy is running.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Discord Bot (Optional)\n\nA Discord bot integration is included for multi-user access through Discord channels.\n\n### Setup\n\n1. Create a Discord application at https://discord.com/developers/applications\n2. Enable \"Message Content Intent\" in the Bot section\n3. Invite the bot to your server with these permissions:\n   - Send Messages\n   - Read Messages/View Channels\n   - Manage Channels\n   - Read Message History\n4. Configure in `.env`:\n\n```dotenv\nDISCORD_BOT_TOKEN=\"your-bot-token-here\"\nDISCORD_GUILD_ID=\"123456789\"               # Your server ID (comma-separated for multiple)\nDISCORD_CONTROL_CHANNEL_ID=\"123456789\"     # Admin channel for status (comma-separated)\nDISCORD_CONVERSATION_CATEGORY_ID=\"123456789\"  # Category for AI channels (comma-separated)\nDISCORD_CONVERSATION_CHANNEL_ID=\"\"         # Specific channel IDs (alternative to categories)\nDISCORD_OWNER_ID=\"123456789\"               # Your Discord user ID\nDISCORD_OWNER_ONLY=true                    # true = owner only, false = anyone in server\nDISCORD_AUTO_COMPACT=true                  # true = summarize/restart, false = drop oldest messages\n```\n\n**Channel Configuration:**\n- **Categories**: Bot responds in any channel under `DISCORD_CONVERSATION_CATEGORY_ID`\n- **Specific Channels**: Bot only responds in `DISCORD_CONVERSATION_CHANNEL_ID` channels\n- **Both**: Can combine (bot responds in specified channels OR channels in categories)\n\n### Bot Commands\n\n| Command | Description |\n|---------|-------------|\n| `/ask [question]` | Ask NIM a question with conversation history |\n| `/compact` | Summarize conversation and restart (with backup option) |\n| `/new` | Clear conversation history without summary |\n| `/download` | Download conversation history as markdown |\n| `/status` | Show bot and rate limit status |\n| `/block [user]` | Block a user from using the bot (owner only) |\n| `/unblock [user]` | Unblock a user (owner only) |\n| `/blocked` | List blocked users (owner only) |\n| `/newchannel [name]` | Create a new AI conversation channel |\n\n### Features\n\n- **Multi-server support**: Configure multiple guilds/servers with comma-separated IDs\n- **Rate limiting**: Per-user cooldown and server-wide limits\n- **Conversation modes**:\n  - `DISCORD_AUTO_COMPACT=true` (default): Summarizes and restarts conversation when token limit reached\n  - `DISCORD_AUTO_COMPACT=false`: Silently drops oldest messages to make room for new ones\n- **Message splitting**: Automatically splits long responses for Discord's 2000 char limit\n- **Command toggles**: Disable individual slash commands via `DISCORD_CMD_*` settings\n\n## MCP Server Mode (Web Search Tools)\n\nNIMbus can also run as an MCP (Model Context Protocol) server, exposing web search and page fetch tools directly to Claude Code. This allows Claude to search the web and fetch page content without going through the NVIDIA NIM proxy.\n\n### Quick Start\n\n```bash\n# Add to Claude Code (using exe)\nclaude mcp add websearch -- nimbus.exe --mcp\n\n# Or using Python (venv)\nclaude mcp add websearch -- /path/to/NIMbus/.venv/bin/python /path/to/NIMbus/start_server.py --mcp\n```\n\n### MCP Tools\n\n| Tool | Description | Parameters |\n|------|-------------|------------|\n| `web_search` | Search the web using DuckDuckGo HTML | `query` (string) |\n| `fetch_page` | Fetch and extract text from a webpage with chunked reading (supports search within page) | `url` (string), `offset` (int, default: 0), `limit` (int, default: 10000), `refresh` (bool, default: false), `search` (string, optional) |\n| `search_cache` | Search all cached pages for a keyword/phrase | `query` (string), `case_sensitive` (bool, default: false), `max_results` (int, default: 50) |\n| `search_cache_snippet` | Search cached pages with surrounding code snippets and smart line boundary detection | `query` (string), `before_chars` (int, default: 400), `after_chars` (int, default: 500), `case_sensitive` (bool, default: false), `max_results` (int, default: 20) |\n\n### Running MCP Server Manually\n\n```bash\n# Development mode\npython start_server.py --mcp\n\n# Standalone exe (Windows)\nnimbus.exe --mcp\n```\n\n### MCP Environment Configuration\n\nThe MCP server inherits settings from `.env`. Configure web search behavior via:\n\n```dotenv\n# MCP Server settings\nNVIDIA_NIM_API_KEY=\"nvapi-your-key-here\"  # Not required for MCP mode but kept for proxy mode\n\n# Web Search Configuration\nWEB_SEARCH_FETCH_TIMEOUT=10.0     # HTTP timeout for fetch_page in seconds (default: 10.0)\n\n# Cache Configuration\nMCP_CACHE_TTL=600                 # Cache TTL in seconds (default: 600 = 10 minutes, max 3600, 0 = disabled)\n                                  # Cache directory is hardcoded to ./NIMBUS_FETCH_CACHE next to mcp_server.py\n```\n\n### Using with Claude Code\n\nOnce added via `claude mcp add websearch ...`, Claude will have access to `web_search`, `fetch_page`, `search_cache`, and `search_cache_snippet` tools. Example usage in Claude:\n\n```\n\u003e Can you search for \"latest Rust async patterns\" and fetch the first result?\n```\n\nClaude will automatically call the MCP tools and return the results.\n\n#### Chunked Reading Example\n\nFor long pages (e.g., documentation), use `offset` and `limit` to read in chunks:\n\n```\n\u003e Fetch page at offset 10000 with limit 10000\n# Returns chunk 10000-20000 with metadata: total_length, cache status, etc.\n\n\u003e Fetch page with refresh=true\n# Forces fresh fetch, bypassing cache\n```\n\nThe `fetch_page` tool returns JSON with:\n- `content`: The requested text chunk\n- `total_length`: Full page length in characters\n- `offset`: Starting position of returned chunk\n- `limit`: Requested chunk size\n- `cached`: Whether served from cache\n- `cache_expires_at`: ISO timestamp when cache expires\n\n**Cache Control:**\n- Set `MCP_CACHE_TTL=0` to disable caching entirely (always fresh)\n- Use `refresh=true` parameter to force fresh fetch on demand\n- Default TTL: 10 minutes (600s), maximum: 1 hour (3600s)\n\n#### Search Within Cache\n\nSearch across all cached pages with `search_cache` (returns matching lines) or `search_cache_snippet` (returns surrounding context):\n\n```\n\u003e Search cached docs for \"_ENV_TEMPLATE\"\n# Returns all matching lines with line numbers and character positions\n\n\u003e Search cached docs for \".env was deleted\" with 400 before, 500 after\n# Returns code snippets with smart line boundary detection\n\n\u003e Fetch Python docs and search for \"async def\"\n# Returns matches within that specific page with context\n```\n\nYou can also search within a specific fetched page using the `search` parameter on `fetch_page`:\n\n```\n\u003e Fetch page with search=\".env was deleted\"\n# Returns matches with line numbers, character positions, and surrounding context\n```\n\n---\n\n## Changelog\n\n### v2.0.2 (June 2026)\n\n**MCP Server mode** with web search and cache search tools:\n- Added `search_cache` — search all cached pages for keywords\n- Added `search_cache_snippet` — search with surrounding context snippets\n- Enhanced `fetch_page` with `search` parameter to find keywords within a page\n- Fixed model mapping when `MODEL=windows:settings.json` — NIM model names are now correctly matched\n\n### v2.0.1 (June 2026)\n\n- Added recap skip optimization\n- Interactive setup wizard with section selection\n\n### v2.0.0 (June 2026)\n\n**Standalone .exe:** NIMbus is now a single portable executable on Windows — no Python, no pip, no venv needed.\n- `nimbus.exe --init`: Interactive setup wizard with live API key validation, model selection, Claude Code auto-config\n- `nimbus.exe --init restore`: Restores backed-up settings.json\n- Auto-creates `.env` from embedded template on first run\n- Single `--onefile` PyInstaller build (~25 MB)\n\n**Dynamic model resolution:** `MODEL=windows:settings.json` reads models from Claude Code's settings.json — no duplication. Model names are resolved dynamically against NVIDIA's catalog.\n\n**Error recovery:**\n- Auto-detects models that reject `system` role and retries with system→user conversion\n- Detailed error logging with full causal chain\n- Tiktoken special token handling (`\u003c|endoftext|\u003e`, `\u003c|fim_prefix|\u003e`, etc.)\n- Fixed HTTP transport request attribution (OpenAI SDK retry compatibility)\n\n**Per-tier model config:** Sonnet/Opus/Haiku each get their own model, mapped from Claude Code settings.json\n\n## License\n\nAGPL-3.0 - See [LICENSE](LICENSE) for details.\n\n## Acknowledgments\n\n- [NVIDIA NIM](https://build.nvidia.com/) for providing free API access\n- [Claude Code](https://github.com/anthropics/claude-code) by Anthropic\n- [FastAPI](https://fastapi.tiangolo.com/) for the web framework\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberofficial%2Fnimbus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberofficial%2Fnimbus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberofficial%2Fnimbus/lists"}