{"id":26360146,"url":"https://github.com/toogle/mlx-dev-server","last_synced_at":"2026-02-18T06:31:02.907Z","repository":{"id":279067462,"uuid":"937624424","full_name":"toogle/mlx-dev-server","owner":"toogle","description":"A server to run MLX models locally, optimized for code completion","archived":false,"fork":false,"pushed_at":"2025-10-31T20:41:03.000Z","size":74,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-31T20:47:43.067Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/toogle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-23T14:21:53.000Z","updated_at":"2025-10-31T20:41:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"831d8508-3deb-4864-af27-1a4b7cbdc3f0","html_url":"https://github.com/toogle/mlx-dev-server","commit_stats":null,"previous_names":["toogle/mlx-dev-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/toogle/mlx-dev-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toogle%2Fmlx-dev-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toogle%2Fmlx-dev-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toogle%2Fmlx-dev-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toogle%2Fmlx-dev-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/toogle","download_url":"https://codeload.github.com/toogle/mlx-dev-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toogle%2Fmlx-dev-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29570326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T06:19:27.422Z","status":"ssl_error","status_checked_at":"2026-02-18T06:18:44.348Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-16T16:36:32.382Z","updated_at":"2026-02-18T06:31:02.902Z","avatar_url":"https://github.com/toogle.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MLX Dev Server\n\n[Installation](#installation) | [Usage](#usage) | [Examples](#examples)\n\nA simple solution to run LLMs locally on Macs with Apple Silicon. Optimized for code completion tasks with DeepSeek, Qwen and other models.\n\n\u003cimg width=\"1046\" alt=\"Screenshot\" src=\"https://github.com/user-attachments/assets/0c2bdec7-1bfe-4a3b-9cfb-067cb34c036f\" /\u003e\n\n## Features\n\n- 🚀 **Fast**: uses [Apple MLX](https://github.com/ml-explore/mlx) to run models on GPU using unified memory\n- 💪 **Efficient**: cancels generation when client disconnects (see [Motivation](#motivation) on why it is important for code completion)\n- 🧩 **Compatible**: provides OpenAI-like API to easily integrate with existing applications (see [Examples](#examples))\n- 💾 **Memory Efficient**: unloads models when they are not used\n- 🔗 **Reliable**: test coverage is 97%\n\n## Motivation\n\nWhile [Ollama](https://github.com/ollama/ollama) is effective for many tasks, it can be less responsive for code completion due to its handling of prompt processing.\n\nCode completion requires quick processing of large inputs (1k+ tokens) and short output generation (\u003c100 tokens typically). And most completions are cancelled because developers often pause for a moment and continue typing, discarding the completion. Ollama processes the entire prompt before cancellation, leading to potential delays.\n\nMLX Dev Server addresses this by cancelling both prompt processing and generation when the client disconnects, ensuring consistent and responsive code completion.\n\n## Installation\n\n```bash\npip install mlx-dev-server\n```\n\n## Usage\n\nSimply run `mlx_dev_server`.\n\nAvailable command line arguments:\n\n\n- `-p, --port`: Port to listen on (default is `8080`)\n- `-k, --keep-alive`: Time in seconds to keep models loaded in memory (default is `300`)\n- `-m, --max-loaded-models`: Maximum number of models to keep loaded (default is `2`)\n- `--host`: Host to listen on (default is `localhost`)\n- `--max-tokens`: Maximum tokens to generate if not specified (default is `4096`)\n- `--max-kv-size`: Maximum size of the key-value cache (default is `4096`)\n- `--prefill-step-size`: Step size for prompt processing (default is `128`)\n\n## Examples\n\n### VSCode\n\nInstall [llm-vscode](https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode) extension. Then add the following to `settings.json`:\n\n```json\n{\n    \"llm.backend\": \"openai\",\n    \"llm.url\": \"http://localhost:8080\",\n    \"llm.modelId\": \"mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx\",\n    \"llm.configTemplate\": \"Custom\",\n    \"llm.requestBody\": {\n        \"parameters\": {\n            \"temperature\": 0.2,\n            \"top_p\": 0.95,\n            \"max_tokens\": 60\n        }\n    },\n    \"llm.fillInTheMiddle.prefix\": \"\u003c｜fim▁begin｜\u003e\",\n    \"llm.fillInTheMiddle.middle\": \"\u003c｜fim▁end｜\u003e\",\n    \"llm.fillInTheMiddle.suffix\": \"\u003c｜fim▁hole｜\u003e\",\n    \"llm.tokenizer\": {\n        \"repository\": \"mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx\"\n    },\n    \"llm.contextWindow\": 1024\n}\n```\n\n\u003e [!NOTE]\n\u003e This configuration limits the number of generated tokens to 60.\n\u003e This is to speed up the response of the model if it decides to generate a multi-line code snippet.\n\n### Neovim\n\nAdd the following spec to [lazy.nvim](https://github.com/folke/lazy.nvim) configuration to enable [llm.nvim](https://github.com/huggingface/llm.nvim) plugin:\n```lua\n{\n  'huggingface/llm.nvim',\n  opts = {\n    backend = 'openai',\n    url = 'http://localhost:8080',\n    model = 'mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx',\n    request_body = {\n      temperature = 0.2,\n      top_p = 0.95,\n      max_tokens = 60\n    },\n    fim = {\n      prefix = '\u003c｜fim▁begin｜\u003e',\n      middle = '\u003c｜fim▁end｜\u003e',\n      suffix = '\u003c｜fim▁hole｜\u003e'\n    },\n    tokenizer = {\n      repository = 'mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx'\n    },\n    context_window = 1024\n  }\n}\n```\n\n### OpenAI Python API library\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(\n    base_url='http://localhost:8080/v1',\n    api_key='mlx-dev-server',  # not needed but required\n)\n\nresponse = client.chat.completions.create(\n    model='mlx-community/Mistral-Nemo-Instruct-2407-8bit',\n    messages=[{\n        'role': 'user',\n        'content': 'say hello',\n    }],\n)\nprint(response.choices[0].message.content)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftoogle%2Fmlx-dev-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftoogle%2Fmlx-dev-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftoogle%2Fmlx-dev-server/lists"}