{"id":15540043,"url":"https://github.com/blaizzy/mlx-vlm","last_synced_at":"2026-04-03T00:40:58.837Z","repository":{"id":233573420,"uuid":"787462297","full_name":"Blaizzy/mlx-vlm","owner":"Blaizzy","description":"MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.","archived":false,"fork":false,"pushed_at":"2025-05-06T21:23:36.000Z","size":35425,"stargazers_count":1232,"open_issues_count":88,"forks_count":117,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-05-06T21:39:03.021Z","etag":null,"topics":["apple-silicon","florence2","idefics","llava","llm","local-ai","mlx","molmo","paligemma","pixtral","vision-framework","vision-language-model","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Blaizzy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"Blaizzy","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"thanks_dev":null,"custom":null}},"created_at":"2024-04-16T15:10:12.000Z","updated_at":"2025-05-06T21:24:50.000Z","dependencies_parsed_at":"2024-04-16T20:06:33.026Z","dependency_job_id":"1cd81abc-d804-4f1d-b5be-c1ad911c59a8","html_url":"https://github.com/Blaizzy/mlx-vlm","commit_stats":{"total_commits":156,"total_committers":12,"mean_commits":13.0,"dds":"0.10256410256410253","last_synced_commit":"2a97875e3283fd13358763fe085b52551d6ff9ad"},"previous_names":["blaizzy/mlx-vlm"],"tags_count":41,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fmlx-vlm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fmlx-vlm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fmlx-vlm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fmlx-vlm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Blaizzy","download_url":"https://codeload.github.com/Blaizzy/mlx-vlm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254040930,"owners_count":22004631,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","florence2","idefics","llava","llm","local-ai","mlx","molmo","paligemma","pixtral","vision-framework","vision-language-model","vision-transformer"],"created_at":"2024-10-02T12:12:17.510Z","updated_at":"2026-04-03T00:40:58.824Z","avatar_url":"https://github.com/Blaizzy.png","language":"Python","funding_links":["https://github.com/sponsors/Blaizzy"],"categories":[],"sub_categories":[],"readme":"[![Upload Python Package](https://github.com/Blaizzy/mlx-vlm/actions/workflows/python-publish.yml/badge.svg)](https://github.com/Blaizzy/mlx-vlm/actions/workflows/python-publish.yml)\n# MLX-VLM\n\nMLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) and Omni Models (VLMs with audio and video support) on your Mac using MLX.\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Command Line Interface (CLI)](#command-line-interface-cli)\n    - [Thinking Budget](#thinking-budget)\n  - [Chat UI with Gradio](#chat-ui-with-gradio)\n  - [Python Script](#python-script)\n- [Activation Quantization (CUDA)](#activation-quantization-cuda)\n- [Multi-Image Chat Support](#multi-image-chat-support)\n  - [Supported Models](#supported-models)\n  - [Usage Examples](#usage-examples)\n- [Model-Specific Documentation](#model-specific-documentation)\n- [TurboQuant KV Cache](#turboquant-kv-cache)\n- [Fine-tuning](#fine-tuning)\n\n## Model-Specific Documentation\n\nSome models have detailed documentation with prompt formats, examples, and best practices:\n\n| Model | Documentation |\n|-------|---------------|\n| DeepSeek-OCR | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/deepseekocr/README.md) |\n| DeepSeek-OCR-2 | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/deepseekocr_2/README.md) |\n| DOTS-OCR | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/dots_ocr/README.md) |\n| DOTS-MOCR | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/dots_ocr/README.md) |\n| GLM-OCR | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/glm_ocr/README.md) |\n| Phi-4 Reasoning Vision | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/phi4_siglip/README.md) |\n| MiniCPM-o | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/minicpmo/README.md) |\n| Phi-4 Multimodal | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/phi4mm/README.md) |\n| MolmoPoint | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/molmo_point/README.md) |\n| Moondream3 | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/moondream3/README.md) |\n| Gemma 4 | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/gemma4/README.md) |\n| Falcon-OCR | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/falcon_ocr/README.md) |\n| Granite Vision 3.2 | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/granite_vision/README.md) |\n| Granite 4.0 Vision | [Docs](https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/granite4_vision/README.md) |\n\n## Installation\n\nThe easiest way to get started is to install the `mlx-vlm` package using pip:\n\n```sh\npip install -U mlx-vlm\n```\n\n## Usage\n\n### Command Line Interface (CLI)\n\nGenerate output from a model using the CLI:\n\n```sh\n# Text generation\nmlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Hello, how are you?\"\n\n# Image generation\nmlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temperature 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg\n\n# Audio generation (New)\nmlx_vlm.generate --model mlx-community/gemma-3n-E2B-it-4bit --max-tokens 100 --prompt \"Describe what you hear\" --audio /path/to/audio.wav\n\n# Multi-modal generation (Image + Audio)\nmlx_vlm.generate --model mlx-community/gemma-3n-E2B-it-4bit --max-tokens 100 --prompt \"Describe what you see and hear\" --image /path/to/image.jpg --audio /path/to/audio.wav\n```\n\n#### Thinking Budget\n\nFor thinking models (e.g., Qwen3.5), you can limit the number of tokens spent in the thinking block:\n\n```sh\nmlx_vlm.generate --model mlx-community/Qwen3.5-2B-4bit \\\n  --thinking-budget 50 \\\n  --thinking-start-token \"\u003cthink\u003e\" \\\n  --thinking-end-token \"\u003c/think\u003e\" \\\n  --enable-thinking \\\n  --prompt \"Solve 2+2\"\n```\n\n| Flag | Description |\n|------|-------------|\n| `--enable-thinking` | Activate thinking mode in the chat template |\n| `--thinking-budget` | Max tokens allowed inside the thinking block |\n| `--thinking-start-token` | Token that opens a thinking block (default: `\u003cthink\u003e`) |\n| `--thinking-end-token` | Token that closes a thinking block (default: `\u003c/think\u003e`) |\n\nWhen the budget is exceeded, the model is forced to emit `\\n\u003c/think\u003e` and transition to the answer. If `--enable-thinking` is passed but the model's chat template does not support it, the budget is applied only if the model generates the start token on its own.\n\n### Chat UI with Gradio\n\nLaunch a chat interface using Gradio:\n\n```sh\nmlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit\n```\n\n### Python Script\n\nHere's an example of how to use MLX-VLM in a Python script:\n\n```python\nimport mlx.core as mx\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load the model\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = load_config(model_path)\n\n# Prepare input\nimage = [\"http://images.cocodataset.org/val2017/000000039769.jpg\"]\n# image = [Image.open(\"...\")] can also be used with PIL.Image.Image objects\nprompt = \"Describe this image.\"\n\n# Apply chat template\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_images=len(image)\n)\n\n# Generate output\noutput = generate(model, processor, formatted_prompt, image, verbose=False)\nprint(output)\n```\n\n#### Audio Example\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load model with audio support\nmodel_path = \"mlx-community/gemma-3n-E2B-it-4bit\"\nmodel, processor = load(model_path)\nconfig = model.config\n\n# Prepare audio input\naudio = [\"/path/to/audio1.wav\", \"/path/to/audio2.mp3\"]\nprompt = \"Describe what you hear in these audio files.\"\n\n# Apply chat template with audio\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_audios=len(audio)\n)\n\n# Generate output with audio\noutput = generate(model, processor, formatted_prompt, audio=audio, verbose=False)\nprint(output)\n```\n\n#### Multi-Modal Example (Image + Audio)\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load multi-modal model\nmodel_path = \"mlx-community/gemma-3n-E2B-it-4bit\"\nmodel, processor = load(model_path)\nconfig = model.config\n\n# Prepare inputs\nimage = [\"/path/to/image.jpg\"]\naudio = [\"/path/to/audio.wav\"]\nprompt = \"\"\n\n# Apply chat template\nformatted_prompt = apply_chat_template(\n    processor, config, prompt,\n    num_images=len(image),\n    num_audios=len(audio)\n)\n\n# Generate output\noutput = generate(model, processor, formatted_prompt, image, audio=audio, verbose=False)\nprint(output)\n```\n\n### Server (FastAPI)\n\nStart the server:\n```sh\nmlx_vlm.server --port 8080\n\n# Preload a model at startup (Hugging Face repo or local path)\nmlx_vlm.server --model \u003chf_repo_or_local_path\u003e\n\n# Preload a model with adapter\nmlx_vlm.server --model \u003chf_repo_or_local_path\u003e --adapter-path \u003cadapter_path\u003e\n\n# With trust remote code enabled (required for some models)\nmlx_vlm.server --trust-remote-code\n```\n\n#### Server Options\n\n- `--model`: Preload a model at server startup, accepts a Hugging Face repo ID or local path (optional, loads lazily on first request if omitted)\n- `--adapter-path`: Path for adapter weights to use with the preloaded model\n- `--host`: Host address (default: `0.0.0.0`)\n- `--port`: Port number (default: `8080`)\n- `--trust-remote-code`: Trust remote code when loading models from Hugging Face Hub\n\nYou can also set trust remote code via environment variable:\n```sh\nMLX_TRUST_REMOTE_CODE=true mlx_vlm.server\n```\n\nThe server provides multiple endpoints for different use cases and supports dynamic model loading/unloading with caching (one model at a time).\n\n#### Available Endpoints\n\n- `/models` and `/v1/models` - List models available locally\n- `/chat/completions` and `/v1/chat/completions` - OpenAI-compatible chat-style interaction endpoint with support for images, audio, and text\n- `/responses` and `/v1/responses` - OpenAI-compatible responses endpoint\n- `/health` - Check server status\n- `/unload` - Unload current model from memory\n\n#### Usage Examples\n\n##### List available models\n\n```sh\ncurl \"http://localhost:8080/models\"\n```\n\n##### Text Input\n\n```sh\ncurl -X POST \"http://localhost:8080/chat/completions\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen2-VL-2B-Instruct-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"Hello, how are you\"\n      }\n    ],\n    \"stream\": true,\n    \"max_tokens\": 100\n  }'\n```\n\n##### Image Input\n\n```sh\ncurl -X POST \"http://localhost:8080/chat/completions\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen2.5-VL-32B-Instruct-8bit\",\n    \"messages\":\n    [\n      {\n        \"role\": \"system\",\n        \"content\": \"You are a helpful assistant.\"\n      },\n      {\n        \"role\": \"user\",\n        \"content\": [\n          {\n            \"type\": \"text\",\n            \"text\": \"This is today's chart for energy demand in California. Can you provide an analysis of the chart and comment on the implications for renewable energy in California?\"\n          },\n          {\n            \"type\": \"input_image\",\n            \"image_url\": \"/path/to/repo/examples/images/renewables_california.png\"\n          }\n        ]\n      }\n    ],\n    \"stream\": true,\n    \"max_tokens\": 1000\n  }'\n```\n\n##### Audio Support (New)\n```sh\ncurl -X POST \"http://localhost:8080/generate\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/gemma-3n-E2B-it-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": [\n          { \"type\": \"text\", \"text\": \"Describe what you hear in these audio files\" },\n          { \"type\": \"input_audio\", \"input_audio\": \"/path/to/audio1.wav\" },\n          { \"type\": \"input_audio\", \"input_audio\": \"https://example.com/audio2.mp3\" }\n        ]\n      }\n    ],\n    \"stream\": true,\n    \"max_tokens\": 500\n  }'\n```\n\n##### Multi-Modal (Image + Audio)\n```sh\ncurl -X POST \"http://localhost:8080/generate\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/gemma-3n-E2B-it-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": [\n          {\"type\": \"input_image\", \"image_url\": \"/path/to/image.jpg\"},\n          {\"type\": \"input_audio\", \"input_audio\": \"/path/to/audio.wav\"}\n        ]\n      }\n    ],\n    \"max_tokens\": 100\n  }'\n```\n\n##### Responses Endpoint\n```sh\ncurl -X POST \"http://localhost:8080/responses\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen2-VL-2B-Instruct-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": [\n          {\"type\": \"input_text\", \"text\": \"What is in this image?\"},\n          {\"type\": \"input_image\", \"image_url\": \"/path/to/image.jpg\"}\n        ]\n      }\n    ],\n    \"max_tokens\": 100\n  }'\n```\n\n#### Request Parameters\n\n- `model`: Model identifier (required)\n- `messages`: Chat messages for chat/OpenAI endpoints\n- `max_tokens`: Maximum tokens to generate\n- `temperature`: Sampling temperature\n- `top_p`: Top-p sampling parameter\n- `top_k`: Top-k sampling cutoff\n- `min_p`: Min-p sampling threshold\n- `repetition_penalty`: Penalty applied to repeated tokens\n- `stream`: Enable streaming responses\n\n\n## Activation Quantization (CUDA)\n\nWhen running on NVIDIA GPUs with MLX CUDA, models quantized with `mxfp8` or `nvfp4` modes require activation quantization to work properly. This converts `QuantizedLinear` layers to `QQLinear` layers which quantize both weights and activations.\n\n### Command Line\n\nUse the `-qa` or `--quantize-activations` flag:\n\n```sh\nmlx_vlm.generate --model /path/to/mxfp8-model --prompt \"Describe this image\" --image /path/to/image.jpg -qa\n```\n\n### Python API\n\nPass `quantize_activations=True` to the `load` function:\n\n```python\nfrom mlx_vlm import load, generate\n\n# Load with activation quantization enabled\nmodel, processor = load(\n    \"path/to/mxfp8-quantized-model\",\n    quantize_activations=True\n)\n\n# Generate as usual\noutput = generate(model, processor, \"Describe this image\", image=[\"image.jpg\"])\n```\n\n### Supported Quantization Modes\n\n- `mxfp8` - 8-bit MX floating point\n- `nvfp4` - 4-bit NVIDIA floating point\n\n\u003e **Note**: This feature is required for mxfp/nvfp quantized models on CUDA. On Apple Silicon (Metal), these models work without the flag.\n\n## Multi-Image Chat Support\n\nMLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.\n\n\n### Usage Examples\n\n#### Python Script\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = model.config\n\nimages = [\"path/to/image1.jpg\", \"path/to/image2.jpg\"]\nprompt = \"Compare these two images.\"\n\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_images=len(images)\n)\n\noutput = generate(model, processor, formatted_prompt, images, verbose=False)\nprint(output)\n```\n\n#### Command Line\n\n```sh\nmlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Compare these images\" --image path/to/image1.jpg path/to/image2.jpg\n```\n\n## Video Understanding\n\nMLX-VLM also supports video analysis such as captioning, summarization, and more, with select models.\n\n### Supported Models\n\nThe following models support video chat:\n\n1. Qwen2-VL\n2. Qwen2.5-VL\n3. Idefics3\n4. LLaVA\n\nWith more coming soon.\n\n### Usage Examples\n\n#### Command Line\n```sh\nmlx_vlm.video_generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Describe this video\" --video path/to/video.mp4 --max-pixels 224 224 --fps 1.0\n```\n\n\nThese examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.\n\n## TurboQuant KV Cache\n\nTurboQuant compresses the KV cache during generation, enabling longer context lengths with less memory while maintaining quality.\n\n### Quick Start\n\n```sh\n# 3.5-bit KV cache quantization (3-bit keys + 4-bit values)\nmlx_vlm generate \\\n  --model mlx-community/Qwen3.5-4B-4bit \\\n  --kv-bits 3.5 \\\n  --kv-quant-scheme turboquant \\\n  --prompt \"Your long prompt here...\"\n```\n\n```python\nfrom mlx_vlm import generate\n\nresult = generate(\n    model, processor, prompt,\n    kv_bits=3.5,\n    kv_quant_scheme=\"turboquant\",\n    max_tokens=256,\n)\n```\n\n### How It Works\n\nTurboQuant uses random rotation + codebook quantization ([arXiv:2504.19874](https://arxiv.org/abs/2504.19874)) to compress KV cache entries from 16-bit to 2-4 bits per dimension:\n\n- **Keys**: ProdCodec (MSE codebook + QJL sign residual) for accurate attention scoring\n- **Values**: MSE codebook for reconstruction quality\n- **Fractional bits** (e.g. 3.5): uses lower bits for keys, higher for values (3-bit K + 4-bit V)\n\nCustom Metal kernels fuse score computation and value aggregation directly on packed quantized data, avoiding full dequantization during decode.\n\n### Performance\n\nTested on Qwen3.5-4B-4bit at 128k context:\n\n| Metric | Baseline | TurboQuant 3.5-bit |\n|--------|----------|-------------------|\n| KV Memory | 4.1 GB | 0.97 GB (**76% reduction**) |\n| Peak Memory | 18.3 GB | 17.3 GB (**-1.0 GB**) |\n\nAt 512k+ contexts, TurboQuant's per-layer attention is **faster than FP16 SDPA** due to reduced memory bandwidth requirements.\n\nTested on gemma-4-31b-it at 128k context:\n\n| Metric | Baseline | TurboQuant 3.5-bit |\n|--------|----------|-------------------|\n| KV Memory | 13.3 GB | 4.9 GB (**63% reduction**) |\n| Peak Memory | 75.2 GB | 65.8 GB (**-9.4 GB**) |\n\n### Supported Bit Widths\n\n| Bits | Compression | Best For |\n|------|------------|----------|\n| 2 | ~8x | Maximum compression, some quality loss |\n| 3 | ~5x | Good balance of quality and compression |\n| 3.5 | ~4.5x | Recommended default (3-bit keys + 4-bit values) |\n| 4 | ~4x | Best quality, moderate compression |\n\n### Compatibility\n\nTurboQuant automatically quantizes `KVCache` layers (global attention). Models with `RotatingKVCache` (sliding window) or `ArraysCache` (MLA/absorbed keys) keep their native cache format for those layers since they are already memory-efficient.\n\n# Fine-tuning\n\nMLX-VLM supports fine-tuning models with LoRA and QLoRA.\n\n## LoRA \u0026 QLoRA\n\nTo learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LORA.MD) file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblaizzy%2Fmlx-vlm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblaizzy%2Fmlx-vlm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblaizzy%2Fmlx-vlm/lists"}