{"id":30357664,"url":"https://github.com/donus3/function-call-mlx-server","last_synced_at":"2026-05-18T03:35:01.804Z","repository":{"id":309468083,"uuid":"1031361496","full_name":"donus3/function-call-mlx-server","owner":"donus3","description":"Modification of mlx_lm.server but support qwen3-coder, gpt_oss with openai function call","archived":false,"fork":false,"pushed_at":"2025-09-02T14:12:36.000Z","size":108,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-02T16:11:50.637Z","etag":null,"topics":["mlx","opencode","qwen"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/donus3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-08-03T15:24:15.000Z","updated_at":"2025-09-02T14:12:39.000Z","dependencies_parsed_at":"2025-08-12T04:28:28.509Z","dependency_job_id":"cafaf381-3ff4-455b-a480-9c3c48b36bfa","html_url":"https://github.com/donus3/function-call-mlx-server","commit_stats":null,"previous_names":["donus3/function-call-mlx-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/donus3/function-call-mlx-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donus3%2Ffunction-call-mlx-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donus3%2Ffunction-call-mlx-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donus3%2Ffunction-call-mlx-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donus3%2Ffunction-call-mlx-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/donus3","download_url":"https://codeload.github.com/donus3/function-call-mlx-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donus3%2Ffunction-call-mlx-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33163774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"online","status_checked_at":"2026-05-18T02:00:06.436Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mlx","opencode","qwen"],"created_at":"2025-08-19T08:18:43.456Z","updated_at":"2026-05-18T03:35:01.789Z","avatar_url":"https://github.com/donus3.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fucntion Calling MLX Server\n\nA lightweight HTTP server for running Qwen and GPT_OSS language models with MLX (Metal for Mac) acceleration. This server provides an OpenAI-compatible API interface for serving Qwen and OSS models locally.\n\n## Features\n\n- **OpenAI-compatible API**: Supports `/v1/chat/completions` and `/v1/completions` endpoints\n- **MLX acceleration**: Leverages Metal for Mac (MLX) for fast inference on Apple Silicon\n- **Speculative decoding**: Supports draft models for faster generation\n- **Prompt caching**: Efficiently reuses common prompt prefixes\n- **Tool calling support**: Native support for function calling with custom formats\n- **Streaming responses**: Real-time token streaming support\n- **Model adapters**: Support for fine-tuned model adapters\n\n## Installation\n\n```bash\n# Install the package\nuv sync\n```\n\n## Usage\n\nStart the server with a Qwen model:\n\n```bash\n# Basic usage\nuv run main.py --type qwen --model \u003cpath-to-qwen-model\u003e\n\n# With custom host and port\nuv run main.py --type qwen --host 0.0.0.0 --port 8080 --model \u003cpath-to-qwen-model\u003e\n\n# With draft model for speculative decoding\nuv run main.py --type qwen --model \u003cpath-to-qwen-model\u003e --draft-model \u003cpath-to-draft-model\u003e\n```\n\n## API Endpoints\n\n### Chat Completions\n```bash\nPOST /v1/chat/completions\n```\n\nExample request:\n```json\n{\n  \"model\": \"mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ\",\n  \"messages\": [\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": \"Hello!\"}\n  ],\n  \"stream\": false\n}\n```\n\n### Text Completions\n```bash\nPOST /v1/completions\n```\n\nExample request:\n```json\n{\n  \"model\": \"mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ\",\n  \"prompt\": \"Hello, my name is\",\n  \"stream\": false\n}\n```\n\n### Health Check\n```bash\nGET /health\n```\n\n### Model List\n```bash\nGET /v1/models\n```\n\n## Configuration Options\n\n- `--model`: Path to the MLX model weights, tokenizer, and config\n- `--adapter-path`: Optional path for trained adapter weights and config\n- `--host`: Host for the HTTP server (default: 127.0.0.1)\n- `--port`: Port for the HTTP server (default: 8080)\n- `--draft-model`: Model to be used for speculative decoding\n- `--num-draft-tokens`: Number of tokens to draft when using speculative decoding\n- `--trust-remote-code`: Enable trusting remote code for tokenizer\n- `--log-level`: Set the logging level (default: INFO)\n- `--chat-template`: Specify a chat template for the tokenizer\n- `--use-default-chat-template`: Use the default chat template\n- `--temp`: Default sampling temperature (default: 0.0)\n- `--top-p`: Default nucleus sampling top-p (default: 1.0)\n- `--top-k`: Default top-k sampling (default: 0, disables top-k)\n- `--min-p`: Default min-p sampling (default: 0.0, disables min-p)\n- `--max-tokens`: Default maximum number of tokens to generate (default: 512)\n\n## Example Usage\n\n### Using curl to test the server:\n\n```bash\n# Chat completion\ncurl http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"Qwen/Qwen2-7B-Instruct\",\n    \"messages\": [\n      {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n      {\"role\": \"user\", \"content\": \"What is the capital of France?\"}\n    ],\n    \"stream\": false\n  }'\n```\n\n### Streaming response:\n```bash\n# Streaming chat completion\ncurl http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ\",\n    \"messages\": [\n      {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n      {\"role\": \"user\", \"content\": \"Write a short story about a robot learning to paint.\"}\n    ],\n    \"stream\": true\n  }'\n```\n\n### Using with sst/opencode\n\nExample configuration\n```json\n{\n  \"$schema\": \"https://opencode.ai/config.json\",\n  \"share\": \"disabled\",\n  \"provider\": {\n    \"mlx-lm\": {\n      \"npm\": \"@ai-sdk/openai-compatible\",\n      \"name\": \"mlx-lm (local)\",\n      \"options\": {\n        \"baseURL\": \"http://127.0.0.1:28100/v1\"\n      },\n      \"models\": {\n        \"mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ\": {\n          \"name\": \"mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ\",\n          \"options\": {\n            \"max_tokens\": 128000,\n          },\n          \"tools\": true\n        }\n      }\n    }\n  }\n}\n```\n\n## Development\n\n### Running Tests\n\n```bash\n# Run the server in development mode\nuv run main.py --model \u003cpath-to-model\u003e\n```\n\n### Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Submit a pull request\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgements\n\nThis project is built on top of:\n- [MLX](https://github.com/ml-explore/mlx) - Metal for Mac\n- [mlx-lm](https://github.com/ml-explore/mlx-lm) - MLX Language Model Inference\n- [Hugging Face Transformers](https://github.com/huggingface/transformers)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdonus3%2Ffunction-call-mlx-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdonus3%2Ffunction-call-mlx-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdonus3%2Ffunction-call-mlx-server/lists"}