{"id":31118709,"url":"https://github.com/chand1012/claude-code-mlx-proxy","last_synced_at":"2025-10-12T14:11:20.871Z","repository":{"id":307516127,"uuid":"1029794747","full_name":"chand1012/claude-code-mlx-proxy","owner":"chand1012","description":"Run Claude Code with Local MLX powered models","archived":false,"fork":false,"pushed_at":"2025-07-31T16:12:59.000Z","size":58,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-02T09:45:59.851Z","etag":null,"topics":["ai","claude","claude-code","llm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chand1012.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-31T15:27:34.000Z","updated_at":"2025-09-22T10:36:08.000Z","dependencies_parsed_at":"2025-07-31T19:09:37.595Z","dependency_job_id":"5faa12c9-c4aa-4658-a6d1-859119d3722e","html_url":"https://github.com/chand1012/claude-code-mlx-proxy","commit_stats":null,"previous_names":["chand1012/claude-code-mlx-proxy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chand1012/claude-code-mlx-proxy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chand1012%2Fclaude-code-mlx-proxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chand1012%2Fclaude-code-mlx-proxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chand1012%2Fclaude-code-mlx-proxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chand1012%2Fclaude-code-mlx-proxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chand1012","download_url":"https://codeload.github.com/chand1012/claude-code-mlx-proxy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chand1012%2Fclaude-code-mlx-proxy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279011544,"owners_count":26084963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","claude","claude-code","llm"],"created_at":"2025-09-17T13:01:38.241Z","updated_at":"2025-10-12T14:11:20.840Z","avatar_url":"https://github.com/chand1012.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Local MLX Backend for Claude Code\n\nThis project provides a local server that acts as a backend for the **Claude Code** command line coding assistant. It allows you to use open-source models running on your local machine via Apple's MLX framework. Instead of sending your code to Anthropic's servers, you can use powerful models like Llama 3, GLM-4.5-Air, DeepSeek, and more, all running on your Apple Silicon Mac.\n\nThis server implements the Claude Messages API format that Claude Code communicates with, redirecting all requests to a local model of your choice.\n\n## Why Use a Local Backend with Claude Code?\n\n- **Total Privacy**: Your code, prompts, and conversations never leave your local machine.\n- **Use Any Model**: Experiment with thousands of open-source models from the [MLX Community on Hugging Face](https://huggingface.co/mlx-community).\n- **Work Offline**: Get code completions and chat with your local model without an internet connection.\n- **No API Keys or Costs**: Run powerful models without needing to manage API keys or pay for usage.\n- **Full Customization**: You have complete control over model parameters and generation settings.\n\n## How to Set It Up\n\nThere are two parts: running the local server, and configuring Claude Code to use it.\n\n### Part 1: Run the Local Server\n\nFirst, get the proxy server running on your machine.\n\n1. **Clone the repository:**\n\n    ```bash\n    git clone https://github.com/chand1012/claude-code-mlx-proxy.git\n    cd claude-code-mlx-proxy\n    ```\n\n2. **Set up the environment:**\n    Copy the example `.env` file:\n\n    ```bash\n    cp .env.example .env\n    ```\n\n    You can edit the `.env` file to customize the model, port, and other settings (see Configuration section below).\n\n3. **Install dependencies:**\n    This project uses `uv` for fast package management.\n\n    ```bash\n    uv sync\n    ```\n\n4. **Start the server:**\n\n    ```bash\n    python main.py\n    ```\n\n    The server will start on `http://localhost:8888` (or as configured in your `.env`) and begin downloading and loading the specified MLX model. This may take some time on the first run.\n\n### Part 2: Configure Claude Code\n\nNext, tell your Claude Code extension to send requests to your local server instead of the official Anthropic API.\n\nAs described in the [official Claude Code documentation](https://docs.anthropic.com/en/docs/claude-code/llm-gateway), you do this by setting the `ANTHROPIC_BASE_URL` environment variable.\n\nThe most reliable way to do this is to **launch your IDE from a terminal** where the variable has been set:\n\n```bash\n# Set the environment variable to point to your local server\nexport ANTHROPIC_BASE_URL=http://localhost:8888\n\n# Now, launch Claude Code from this same terminal window\nclaude\n```\n\nOnce your IDE is running, Claude Code will automatically use your local MLX backend. You can now chat with it or use its code completion features, and all requests will be handled by your local model.\n\n### Testing the Server\n\nBefore configuring Claude Code, you can verify the server is working correctly by sending it a `curl` request from your terminal:\n\n#### Testing the Messages Endpoint\n\n```bash\ncurl -X POST http://localhost:8888/v1/messages \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"claude-4-sonnet-20250514\",\n    \"max_tokens\": 100,\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Explain what MLX is in one sentence.\"}\n    ]\n  }'\n```\n\nThis will return a Claude-style response:\n\n```json\n{\n  \"id\": \"msg_12345678\",\n  \"type\": \"message\",\n  \"role\": \"assistant\",\n  \"content\": [\n    {\n      \"type\": \"text\",\n      \"text\": \"MLX is Apple's machine learning framework optimized for efficient training and inference on Apple Silicon chips.\"\n    }\n  ],\n  \"model\": \"claude-4-sonnet-20250514\",\n  \"stop_reason\": \"end_turn\",\n  \"stop_sequence\": null,\n  \"usage\": {\n    \"input_tokens\": 12,\n    \"output_tokens\": 18\n  }\n}\n```\n\n#### Testing Token Counting\n\nYou can also test the token counting endpoint:\n\n```bash\ncurl -X POST http://localhost:8000/v1/messages/count_tokens \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"claude-4-sonnet-20250514\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Explain what MLX is in one sentence.\"}\n    ]\n  }'\n```\n\nThis returns the token count:\n\n```json\n{\n  \"input_tokens\": 12\n}\n```\n\n#### Streaming Support\n\nThe server also supports streaming responses using Server-Sent Events (SSE), just like the real Claude API:\n\n```bash\ncurl -X POST http://localhost:8888/v1/messages \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"claude-4-sonnet-20250514\",\n    \"max_tokens\": 100,\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Explain what MLX is in one sentence.\"}\n    ],\n    \"stream\": true\n  }'\n```\n\nThis will return a stream of events following the Claude streaming format.\n\n## API Endpoints\n\nThe server implements the following Claude-compatible endpoints:\n\n- `POST /v1/messages` - Create a message (supports both streaming and non-streaming)\n- `POST /v1/messages/count_tokens` - Count tokens in a message\n- `GET /` - Root endpoint with server status\n- `GET /health` - Health check endpoint\n\n## Configuration (`.env`)\n\nAll server settings are managed through the `.env` file.\n\n| Variable              | Default                                       | Description                                                                                             |\n| --------------------- | --------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| `HOST`                | `0.0.0.0`                                     | The host address for the server.                                                                        |\n| `PORT`                | `8888`                                        | The port for the server.                                                                                |\n| `MODEL_NAME`          | `mlx-community/GLM-4.5-Air-3bit`              | The MLX model to load from Hugging Face. Find more at the [MLX Community](https://huggingface.co/mlx-community). |\n| `API_MODEL_NAME`      | `claude-4-sonnet-20250514`                    | The model name that the API will report. Set this to a known Claude model to ensure client compatibility. |\n| `TRUST_REMOTE_CODE`   | `false`                                       | Set to `true` if the model tokenizer requires trusting remote code.                                     |\n| `EOS_TOKEN`           | `None`                                        | The End-of-Sequence token, required for some models like Qwen.               |\n| `DEFAULT_MAX_TOKENS`  | `4096`                                        | The default maximum number of tokens to generate in a response.                                         |\n| `DEFAULT_TEMPERATURE` | `1.0`                                         | The default temperature for generation (creativity).                                                    |\n| `DEFAULT_TOP_P`       | `1.0`                                         | The default top-p for generation.                                                                       |\n| `VERBOSE`             | `false`                                       | Set to `true` to enable verbose logging from the MLX generate function.                                 |\n\n## License\n\nThis project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchand1012%2Fclaude-code-mlx-proxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchand1012%2Fclaude-code-mlx-proxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchand1012%2Fclaude-code-mlx-proxy/lists"}