{"id":30727583,"url":"https://github.com/dnakov/qwen-mlx-server","last_synced_at":"2025-09-03T14:07:37.098Z","repository":{"id":307565270,"uuid":"1029966367","full_name":"dnakov/qwen-mlx-server","owner":"dnakov","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-01T04:36:05.000Z","size":19,"stargazers_count":24,"open_issues_count":0,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-30T05:52:00.485Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dnakov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-31T21:41:17.000Z","updated_at":"2025-08-16T04:18:58.000Z","dependencies_parsed_at":"2025-08-01T00:07:36.003Z","dependency_job_id":"a77171c9-bf75-4fd5-9552-1e658535d79e","html_url":"https://github.com/dnakov/qwen-mlx-server","commit_stats":null,"previous_names":["dnakov/qwen-mlx-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dnakov/qwen-mlx-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnakov%2Fqwen-mlx-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnakov%2Fqwen-mlx-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnakov%2Fqwen-mlx-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnakov%2Fqwen-mlx-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dnakov","download_url":"https://codeload.github.com/dnakov/qwen-mlx-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnakov%2Fqwen-mlx-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273453710,"owners_count":25108473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-03T14:07:35.523Z","updated_at":"2025-09-03T14:07:37.088Z","avatar_url":"https://github.com/dnakov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Qwen MLX Server with Tool Support\n\nAn enhanced MLX server that provides OpenAI-compatible tool calling for Qwen3 models by parsing their native XML format and converting it to OpenAI JSON format.\n\n## Features\n\n- ✅ **XML to JSON conversion**: Automatically converts Qwen3's `\u003ctool_call\u003e` XML format to OpenAI JSON\n- ✅ **OpenAI compatibility**: Drop-in replacement for OpenAI's chat completions API\n- ✅ **Streaming support**: Proper streaming with XML filtering to prevent raw XML in output\n- ✅ **Robust parsing**: Handles incomplete and malformed XML gracefully\n- ✅ **vLLM compliance**: Based on official vLLM Qwen3XMLToolParser implementation\n\n## Installation\n\n```bash\ngit clone https://github.com/yourusername/qwen-mlx-server.git\ncd qwen-mlx-server\npip install -r requirements.txt\n```\n\n## Quick Start\n\n```bash\n# Basic usage with default template\npython qwen_server_with_tools.py --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit\n\n# With custom chat template for better tool calling\npython qwen_server_with_tools.py \\\n  --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit \\\n  --chat-template \"$(cat qwen3_coder_chat_template.jinja)\"\n\n# With different log level (WARNING for production, DEBUG for development)\npython qwen_server_with_tools.py \\\n  --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit \\\n  --log-level WARNING\n\n# With existing LM Studio download\npython qwen_server_with_tools.py \\\n  --model ~/.cache/lm-studio/models/mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit \\\n  --chat-template \"$(cat qwen3_coder_chat_template.jinja)\"\n# Note, when entering API details into a tool such as Qwen Code, the model name should be \"default_model\"\n# to avoid a redownload of the model.\n```\n\n## Usage Example\n\n### Tool Calling Request\n\n```bash\ncurl -X POST http://127.0.0.1:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"Calculate 15 * 7\"\n      }\n    ],\n    \"tools\": [\n      {\n        \"type\": \"function\",\n        \"function\": {\n          \"name\": \"calculate\",\n          \"description\": \"Perform mathematical calculations\",\n          \"parameters\": {\n            \"type\": \"object\",\n            \"properties\": {\n              \"expression\": {\n                \"type\": \"string\",\n                \"description\": \"Mathematical expression to evaluate\"\n              }\n            },\n            \"required\": [\"expression\"]\n          }\n        }\n      }\n    ],\n    \"stream\": false\n  }'\n```\n\n### How It Works\n\nThe server automatically converts Qwen3's native XML output:\n```xml\n\u003ctool_call\u003e\n\u003cfunction=calculate\u003e\n\u003cparameter=expression\u003e\n15 * 7\n\u003c/parameter\u003e\n\u003c/function\u003e\n\u003c/tool_call\u003e\n```\n\nTo OpenAI-compatible JSON:\n```json\n{\n  \"choices\": [{\n    \"message\": {\n      \"role\": \"assistant\",\n      \"content\": \"\",\n      \"tool_calls\": [{\n        \"type\": \"function\",\n        \"id\": \"call_12345\",\n        \"function\": {\n          \"name\": \"calculate\",\n          \"arguments\": \"{\\\"expression\\\": \\\"15 * 7\\\"}\"\n        }\n      }]\n    },\n    \"finish_reason\": \"tool_calls\"\n  }]\n}\n```\n\n## Command Line Options\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `--model` | HuggingFace model path | `mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit` |\n| `--host` | Server host | `127.0.0.1` |\n| `--port` | Server port | `8080` |\n| `--chat-template` | Custom chat template file | `\"\"` (uses model default) |\n| `--use-default-chat-template` | Force use of model's default template | `False` |\n| `--log-level` | Logging verbosity | `INFO` |\n| `--max-tokens` | Default max tokens to generate | `512` |\n\n## Logging Levels\n\n- `DEBUG`: Shows detailed XML parsing and conversion steps (useful for development)\n- `INFO`: Standard operational messages (default)\n- `WARNING`: Only warnings and errors (recommended for production)\n- `ERROR`: Only errors\n\n## Files\n\n- `qwen_server_with_tools.py` - Main server with XML→JSON tool parsing\n- `qwen3_coder_chat_template.jinja` - Optimized Qwen3-Coder chat template\n- `requirements.txt` - Python dependencies\n- `README.md` - This documentation\n\n## Supported Models\n\nDesigned for Qwen3-Coder models but should work with any Qwen model that outputs XML tool calls:\n\n- `mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit`\n- `mlx-community/Qwen3-Coder-7B-A3B-Instruct-4bit`\n- Other Qwen3 variants\n\n## Development\n\nTo see detailed XML parsing logs:\n```bash\npython qwen_server_with_tools.py \\\n  --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit \\\n  --log-level DEBUG\n```\n\n## Implementation Notes\n\n- Based on vLLM's `Qwen3XMLToolParser` for maximum compatibility\n- Handles both streaming and non-streaming requests correctly  \n- Gracefully handles incomplete XML during token-by-token generation\n- Maintains full OpenAI Chat Completions API compatibility\n- Supports parameter type conversion and validation\n- Filters XML from streaming output to prevent malformed responses\n\n## License\n\nMIT License - feel free to use this in your projects!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdnakov%2Fqwen-mlx-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdnakov%2Fqwen-mlx-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdnakov%2Fqwen-mlx-server/lists"}