{"id":50436924,"url":"https://github.com/vinbyte/modal-collections","last_synced_at":"2026-05-31T17:31:00.018Z","repository":{"id":357797793,"uuid":"1238492710","full_name":"vinbyte/modal-collections","owner":"vinbyte","description":"A collection of python scripts for modal.com","archived":false,"fork":false,"pushed_at":"2026-05-27T09:47:14.000Z","size":2030,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T10:23:06.794Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vinbyte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-14T07:02:55.000Z","updated_at":"2026-05-27T09:47:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vinbyte/modal-collections","commit_stats":null,"previous_names":["vinbyte/modal-collections"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vinbyte/modal-collections","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinbyte%2Fmodal-collections","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinbyte%2Fmodal-collections/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinbyte%2Fmodal-collections/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinbyte%2Fmodal-collections/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vinbyte","download_url":"https://codeload.github.com/vinbyte/modal-collections/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinbyte%2Fmodal-collections/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33742185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-31T17:30:59.364Z","updated_at":"2026-05-31T17:31:00.002Z","avatar_url":"https://github.com/vinbyte.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Modal Collections\n\nA monorepo of GPU workloads deployed on [Modal](https://modal.com). Each subfolder is a self-contained app you can deploy independently.\n\n## Structure\n\n```\nmodal-collections/\n├── wan2gp/          # Wan2GP video generation (Gradio)\n│   ├── wan2gp.py\n│   └── .venv/       # (optional, for local IDE support)\n├── vllm/            # vLLM LLM inference (OpenAI-compatible API)\n│   ├── serve.py\n│   └── README.md\n├── llamacpp/        # llama.cpp LLM inference (OpenAI-compatible API)\n│   └── serve.py\n├── comfyui/         # ComfyUI with modular plugins + video API\n│   ├── comfyui.py\n│   ├── models_example.py\n│   ├── plugins_example.py\n│   ├── workflows.py\n│   └── workflows/\n└── README.md\n```\n\nEvery app folder contains its own `*.py` entrypoint and optionally a `.venv` for local development. There is no shared state between apps.\n\n## Prerequisites\n\n1. **Python 3.12+**\n2. **Modal CLI** — install and authenticate:\n   ```bash\n   pip install modal\n   modal setup\n   ```\n3. **Modal account** with GPU quota (A100 or better recommended for Wan2GP)\n\n\u003e **No local dependency install needed.** All dependencies (PyTorch, xformers, etc.) are defined in `modal.Image` and installed in the remote container at build time. You don't need to `pip install` or `uv sync` anything before deploying.\n\n## Local Development (Optional)\n\nIf you want IDE autocomplete, type checking, or linting locally, you can create a `.venv` per app folder. This is **not required** for deploy — it's purely for editor support.\n\n```bash\ncd wan2gp\npython -m venv .venv\nsource .venv/bin/activate\npip install modal         # at minimum, for the modal SDK\n# Add any other packages you want autocomplete for, e.g.:\n# pip install torch torchvision gradio\n```\n\nEach app folder keeps its own `.venv` so they stay isolated. `.venv` directories are gitignored.\n\n## Running an App\n\n\u003e **First time?** Run `modal setup` to authenticate before any `modal` command. You only need to do this once.\n\n### Development (ephemeral)\n\n```bash\nmodal serve wan2gp/wan2gp.py\n```\n\nThis spins up the app temporarily. Logs stream to your terminal. Press `Ctrl+C` to stop.\n\n### Production (persistent)\n\n```bash\nmodal deploy wan2gp/wan2gp.py\n```\n\nThe app stays running and auto-scales. You get a stable URL for any web endpoints.\n\n### Common commands\n\n| Command | Purpose |\n|---|---|\n| `modal serve \u003cpath\u003e` | Run with hot-reload, logs in terminal |\n| `modal deploy \u003cpath\u003e` | Deploy persistently with autoscaling |\n| `modal app list` | List running apps |\n| `modal app stop \u003cname\u003e` | Stop a deployed app |\n| `modal volume ls wan2gp-data` | Inspect persistent volume contents |\n\n## App: ComfyUI\n\nComfyUI with modular plugin/model management, GPU snapshots, and an OpenAI-compatible video generation API. Pre-configured for the VideoFlow LTX 2.3 All-in-One v3.0 workflow.\n\n- **Image**: Debian slim + comfy-cli + starlette proxy\n- **GPU**: L40S (48 GB VRAM)\n- **Model**: LTX 2.3 fp8 checkpoint (~29 GB) + text encoders + VAEs\n- **Storage**: `hf-hub-cache` Modal Volume (models persist across restarts)\n- **Endpoint**: ComfyUI UI proxied on port 8000, video API at `/v1/video/generations`\n\n### Quick start\n\n```bash\n# Copy configs\ncp comfyui/models_example.py comfyui/models.py\ncp comfyui/plugins_example.py comfyui/plugins.py\n\n# Deploy\nmodal deploy comfyui/comfyui.py\n\n# Test\nmodal run comfyui/comfyui.py --prompt \"A cinematic sunset time-lapse\"\n```\n\nSee [`comfyui/README.md`](comfyui/README.md) for full configuration details.\n\n## App: Wan2GP\n\nWan2GP video generation with a Gradio UI, running on an A100 80GB GPU.\n\n- **Image**: Debian slim + PyTorch 2.8.0 (CUDA 12.8) + xformers\n- **GPU**: A100 (profile 1)\n- **Storage**: `wan2gp-data` Modal Volume (checkpoints, LoRAs, outputs, cache persist across restarts)\n- **Endpoint**: Gradio web server on port 7860\n\n### Volume layout\n\n```\nwan2gp-data/\n├── ckpts/          # model checkpoints\n├── loras/          # LoRA weights\n│   ├── ltx2/\n│   └── ltx2_22B/\n├── outputs/        # generated videos\n└── cache/          # HF hub, transformers, torch caches\n```\n\n### Deploy\n\n```bash\nmodal deploy wan2gp/wan2gp.py\n```\n\nModal will print the Gradio URL after the container starts.\n\n## App: vLLM\n\nOpenAI-compatible LLM inference server using vLLM, currently running Qwen3.6-27B AWQ INT4 (4-bit quantized, ~21 GB VRAM) with vision + reasoning support on an A100 80GB GPU at full 262K context length.\n\n- **Image**: NVIDIA CUDA 12.9 + vLLM 0.19.0 + transformers 5.5.0\n- **GPU**: A100-80GB (1x)\n- **Model**: cyankiwi/Qwen3.6-27B-AWQ-INT4 (AWQ 4-bit, ~21 GB VRAM, full 262K context)\n- **Storage**: `huggingface-cache` + `vllm-cache` Modal Volumes (model weights and JIT cache persist across restarts)\n- **Endpoint**: OpenAI-compatible API on port 8000 (`/v1/chat/completions`, `/v1/completions`, `/v1/models`)\n\n### Quick test\n\n```bash\nmodal run vllm/serve.py\nmodal run vllm/serve.py --content \"What is the meaning of life?\"\n```\n\n### Deploy\n\n```bash\nmodal deploy vllm/serve.py\n```\n\nSee [`vllm/README.md`](vllm/README.md) for full API usage examples and configuration details.\n\n## App: llama.cpp\n\nOpenAI-compatible LLM inference server using llama.cpp, running Qwen3.5-9B Q4_K_M (4-bit quantized, ~5.6 GB) with flash attention, continuous batching, and KV cache quantization on an L4 GPU.\n\n- **Image**: ghcr.io/ggml-org/llama.cpp:server-cuda\n- **GPU**: L4 (16 GB VRAM)\n- **Model**: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF (Q4_K_M, ~5.6 GB)\n- **Storage**: `huggingface-cache` Modal Volume (model weights persist across restarts)\n- **Endpoint**: OpenAI-compatible API on port 8080 (`/v1/chat/completions`, `/v1/completions`, `/v1/models`)\n\n### Quick test\n\n```bash\nmodal run llamacpp/serve.py\nmodal run llamacpp/serve.py --content \"What is the meaning of life?\"\n```\n\n### Deploy\n\n```bash\nmodal deploy llamacpp/serve.py\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinbyte%2Fmodal-collections","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvinbyte%2Fmodal-collections","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinbyte%2Fmodal-collections/lists"}