{"id":30799198,"url":"https://github.com/alexgenovese/docker-pruna","last_synced_at":"2026-04-06T06:04:16.897Z","repository":{"id":309692254,"uuid":"1027512194","full_name":"alexgenovese/docker-pruna","owner":"alexgenovese","description":"Download and Compile Any Diffusion Models in your Endpoint","archived":false,"fork":false,"pushed_at":"2025-08-21T11:04:52.000Z","size":199,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-05T19:25:40.830Z","etag":null,"topics":["compile","diffusers","docker","dockerfile","endpoint","pruna"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexgenovese.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-28T06:02:46.000Z","updated_at":"2025-08-22T08:34:39.000Z","dependencies_parsed_at":"2025-09-06T16:00:27.250Z","dependency_job_id":null,"html_url":"https://github.com/alexgenovese/docker-pruna","commit_stats":null,"previous_names":["alexgenovese/docker-pruna"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/alexgenovese/docker-pruna","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexgenovese%2Fdocker-pruna","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexgenovese%2Fdocker-pruna/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexgenovese%2Fdocker-pruna/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexgenovese%2Fdocker-pruna/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexgenovese","download_url":"https://codeload.github.com/alexgenovese/docker-pruna/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexgenovese%2Fdocker-pruna/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31461534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compile","diffusers","docker","dockerfile","endpoint","pruna"],"created_at":"2025-09-05T19:09:04.985Z","updated_at":"2026-04-06T06:04:16.884Z","avatar_url":"https://github.com/alexgenovese.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- Title and badges --\u003e\n# Docker Pruna — Download \u0026 Compile Diffusers Models\n\n![CI](https://img.shields.io/badge/CI-pending-lightgrey) ![Python](https://img.shields.io/badge/python-3.8%2B-blue) ![Docker](https://img.shields.io/badge/docker-ready-blue) ![License](https://img.shields.io/badge/license-Unspecified-lightgrey)\n\n## Overview\nDocker Pruna is a Docker-ready toolkit and lightweight Flask API to download, compile, and serve diffusion models (e.g., Stable Diffusion, FLUX) optimized with Pruna for faster inference. It includes:\n- Configurable download and compilation pipelines\n- Smart device-aware configuration (CUDA, CPU, MPS)\n- Multiple compilation modes (fast, moderate, normal)\n- Diagnostics and memory-aware helpers\n\nThis repository is a Docker-ready toolkit and a lightweight Flask API to download, compile and serve diffusion models (Stable Diffusion, FLUX and others) optimized with Pruna for faster inference. \n\n**It includes an intelligent configurator that manages device compatibility (CUDA, CPU, Apple MPS), automatic fallbacks, and memory-aware compilation modes.**\n\n## TODO\n- [x] Async Download Opt\n- [x] Download Compiled Models from HF\n- [ ] Push to Hub (compiled model)\n- [ ] Qwen\n- [ ] WAN 2.2\n\n## Table of Contents\n1. [Key Features](#key-features)\n2. [Prerequisites](#prerequisites)\n3. [Installation](#installation)\n4. [Configuration](#configuration)\n5. [Quick Start](#quick-start)\n6. [API Endpoints](#api-endpoints)\n7. [CLI Examples](#cli-examples)\n8. [Compilation Modes](#compilation-modes)\n9. [Diagnostics \u0026 Helper Scripts](#diagnostics--helper-scripts)\n10. [File Layout](#file-layout)\n11. [Docker Usage](#docker-usage)\n12. [Troubleshooting](#troubleshooting)\n13. [System Requirements](#system-requirements)\n14. [Credits](#credits)\n\n## Key Features\n- Download models from Hugging Face into `./models/`\n- Compile models with Pruna and store artifacts in `./compiled_models/`\n- Lightweight Flask API for download, compile, generate, delete operations\n- Smart `PrunaModelConfigurator` with device-aware fallback\n- Compilation modes: `fast`, `moderate`, `normal`\n- Helpers for CUDA/MPS/CPU diagnostics and memory-aware compilation\n\n### New features\n- `PrunaModelConfigurator` smart class\n- Auto-detection for several model families\n- Device-specific configuration recommendations\n- Tests and diagnostic scripts\n\n\n## Installation\n```bash\ngit clone https://github.com/alexgenovese/docker-pruna.git\ncd docker-pruna\npip install -r requirements.txt\n```\n\n## Configuration\n### Environment Variables\n- `MODEL_DIFF` — default model ID (default: `CompVis/stable-diffusion-v1-4`)\n- `DOWNLOAD_DIR` — local models directory (default: `./models`)\n- `PRUNA_COMPILED_DIR` — compiled models directory (default: `./compiled_models`)\n\n### CLI Arguments\nRun `python3 download_model_and_compile.py --help` to view options:\n```bash\n--model-id MODEL_ID        Hugging Face model ID\n--download-dir DIR         Download directory\n--compiled-dir DIR         Compiled models directory\n--skip-download            Skip download step\n--skip-compile             Skip compilation step\n--torch-dtype TYPE         Torch dtype (float16/float32)\n--compilation-mode MODE    fast|moderate|normal\n--device DEVICE            cuda|cpu|mps\n--force-cpu                Force CPU compilation\n```\n\n## Quick Start\nDownload and compile a model:\n```bash\npython3 download_model_and_compile.py \\\n  --model-id runwayml/stable-diffusion-v1-5 \\\n  --compilation-mode moderate\n```\nStart the API server:\n```bash\npython3 server.py --host 127.0.0.1 --port 8000 --debug \u0026\n```\n\nAsynchronous downloads\n----------------------\nThe API now runs potentially long-running downloads in a background task to avoid HTTP timeouts (eg. 524). When you POST to `/download` the server will immediately respond with a 202 Accepted and a `task_id` plus a `status_url` you can poll for progress and result.\n\nExample (enqueue download):\n\n```bash\ncurl -X POST http://127.0.0.1:8000/download \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"model_id\":\"runwayml/stable-diffusion-v1-5\"}'\n```\n\nSample response:\n\n```json\n{ \"status\": \"accepted\", \"task_id\": \"...\", \"status_url\": \"http://.../tasks/\u003ctask_id\u003e\" }\n```\n\nPoll the task status:\n\n```bash\ncurl http://127.0.0.1:8000/tasks/\u003ctask_id\u003e\n```\n\nThe task JSON will include `status` (queued|running|finished|error) and, when finished, a `result` field with the downloaded model path or an `error` message.\n\n\n## API Endpoints\nAll endpoints accept and return JSON.\n| Method | Endpoint               | Description                          |\n|--------|------------------------|--------------------------------------|\n| POST   | `/download`            | Enqueue a model download (async)     |\n| GET    | `/tasks/\u003ctask_id\u003e`     | Get status/result for an async task  |\n| POST   | `/compile`             | Compile a downloaded model           |\n| POST   | `/generate`            | Generate images from a prompt        |\n| POST   | `/delete-model`        | Delete downloaded/compiled model     |\n| GET    | `/ping`                | Liveness check                       |\n| GET    | `/health`              | Server health and configuration      |\n\nExample — generate:\n```bash\ncurl -X POST http://127.0.0.1:8000/generate \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"model_id\":\"runwayml/stable-diffusion-v1-5\",\"prompt\":\"A sunset\"}'\n```\n\n## CLI Examples\n- Download only: `python3 download_model_and_compile.py --model-id runwayml/stable-diffusion-v1-5`\n- Download + compile (fast):\n  ```bash\n  python3 download_model_and_compile.py \\\n    --model-id runwayml/stable-diffusion-v1-5 \\\n    --compilation-mode fast\n  ```\n\n## Compilation Modes\n- **fast**: Quick development compile (DeepCache, half precision). Good for rapid iterations.\n- **moderate**: Balanced speed and quality (TorchCompile + 8-bit HQQ)\n- **normal**: Full optimizations (FORA, factorizer, autotune). Full optimizations for production, longer compile time.\n\nUse `--compilation-mode` to pick the mode when running the CLI or API compile endpoint.\n\n## Diagnostics \u0026 Helper Scripts\n- `test_pruna_cuda.py` — CUDA and Pruna diagnostics\n- `check_pruna_setup.py` — environment checks\n- `compile_with_memory_mgmt.py` — memory-aware compilation\n- `restart_clean_compile.sh` — clean GPU memory before compile\n\n## File Layout\n```text\nlib/\n├ pruna_config.py    Smart configurator\n├ const.py           Constants\n└ utils.py           Utilities\n\ndownload_model_and_compile.py  Main download/compile CLI\ndownload_model_and_compile.py  Main CLI\nserver.py                     Flask API server\n*test_*.py                    Test and diagnostic scripts\n*_compile.py                  Compilation helpers\n```\n\n## Docker Usage\n\nBuild the image (simple):\n```bash\ndocker build -t docker-pruna .\n```\n\nBuild with a precompiled model baked-in at build time\n(this will run the repository's `download_model_and_compile.py` during the build):\n\nNote: downloading/compiling at build-time requires network access and\nthe heavy Python dependencies (it increases build-time and image size).\nAlso avoid passing secrets via plain `--build-arg` for public/CI builds — use\nBuildKit secrets instead (recommended) so the token doesn't end up in image\nlayers.\n\nInsecure (quick) example — pass HF token as a build arg (NOT recommended for public images):\n```bash\ndocker build -t docker-pruna:with-model \\\n  --build-arg PRUNA_COMPILED_MODEL=\"runwayml/stable-diffusion-v1-5\" \\\n  --build-arg HF_TOKEN=\"\u003cYOUR_HF_TOKEN\u003e\" .\n```\n\nRecommended (secure) BuildKit example using a secret file:\n```bash\n# create a file with your HF token (CI secrets preferred)\necho -n \"\u003cYOUR_HF_TOKEN\u003e\" \u003e hf_token.txt\n\n# Build with BuildKit and mount the token as a secret at /run/secrets/hf_token\nDOCKER_BUILDKIT=1 docker build --progress=plain -t docker-pruna:with-model \\\n  --secret id=hf_token,src=hf_token.txt \\\n  --build-arg PRUNA_COMPILED_MODEL=\"runwayml/stable-diffusion-v1-5\" .\n```\n\nIf you use the BuildKit secret approach the Dockerfile mounts the secret at\n`/run/secrets/hf_token` for the single build-step and the token is not stored\nin any image layer. This is the recommended way to provide private tokens at\nbuild-time.\n\nRun container:\n```bash\ndocker run --rm -e MODEL_DIFF=runwayml/stable-diffusion-v1-5 docker-pruna\n```\n\n## Key features\n\n- Download models from Hugging Face into `./models/`.\n- Compile models with Pruna and store optimized artifacts in `./compiled_models/`.\n- Lightweight Flask API to trigger download, compile, generate and delete operations.\n- Smart `PrunaModelConfigurator` that provides device-aware, safe Pruna configurations and fallbacks.\n- Compilation modes: `fast`, `moderate`, `normal` (speed vs quality trade-offs).\n- Helpers for CUDA/MPS/CPU diagnostics and memory-aware compilation.\n\n## Environment Variables\n\n- `MODEL_DIFF`: model ID on Hugging Face (default: `CompVis/stable-diffusion-v1-4`)\n- `DOWNLOAD_DIR`: Directory to download the models (default: `./models`)\n- `PRUNA_COMPILED_DIR`: Directory to store compiled models with Pruna (default: `./compiled_models`)\n\n### How to use by CLI\n```bash\npython3 main.py --help\n\noptional arguments:\n  --model-id MODEL_ID    Hugging Face model ID to download\n  --download-dir DIR     Directory to download models\n  --compiled-dir DIR     Directory to save compiled Pruna models\n  --skip-download        Skip download step (use existing model)\n  --skip-compile         Skip compilation step (only download)\n  --torch-dtype TYPE     Torch dtype for model loading (float16/float32)\n```\n\n\n## Quick start\n\nClone and install dependencies:\n\n```bash\ngit clone \u003cyour-repo\u003e\ncd docker-pruna\npip install -r requirements.txt\n```\n\nDownload and compile a model (moderate mode):\n\n```bash\npython3 download_model_and_compile.py \\\n  --model-id runwayml/stable-diffusion-v1-5 \\\n  --compilation-mode moderate\n```\n\nRun the Flask API locally:\n\n```bash\npython3 server.py --host 127.0.0.1 --port 8000 --debug \u0026\n```\n\n## Configuration\n\nEnvironment variables (defaults shown):\n\n- `MODEL_DIFF` — default model id (default: `CompVis/stable-diffusion-v1-4`)\n- `DOWNLOAD_DIR` — where models are downloaded (default: `./models`)\n- `PRUNA_COMPILED_DIR` — where compiled Pruna models are saved (default: `./compiled_models`)\n\nCLI arguments (see `download_model_and_compile.py --help`):\n\n```bash\npython3 download_model_and_compile.py --help\n\n# common options: --model-id, --download-dir, --compiled-dir, --skip-download, --skip-compile,\n# --torch-dtype, --compilation-mode, --device, --force-cpu\n```\n\n## API endpoints (JSON)\n\n**POST /download**\n- Enqueue a Hugging Face model download. The endpoint is asynchronous and returns a `task_id` and `status_url` you can poll.\n\nSee the \"Asynchronous downloads\" section above for examples.\n\n\n**POST /compile**\n- Compile an already downloaded model with Pruna and save into the compiled models directory.\n\nExample:\n\n```bash\ncurl -X POST http://127.0.0.1:8000/compile \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model_id\": \"runwayml/stable-diffusion-v1-5\", \"compilation_mode\" : \"fast\"}'\n```\n\n**POST /generate**\n- Generate images from a prompt using a compiled model.\n\nExample:\n\n```bash\ncurl -X POST http://127.0.0.1:8000/generate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model_id\": \"runwayml/stable-diffusion-v1-5\", \"prompt\" : \"A beautiful sunset over the ocean\", \"num_inference_steps\": 20, \"guidance_scale\": 7.5}'\n```\n\nResponse contains base64-encoded images and optional saved file paths and returns the url to downlaod the image when `debug: true`.\n\n**POST /delete-model**\n- Delete downloaded and/or compiled folders for a given model.\n\nExample:\n```bash\ncurl -X POST http://127.0.0.1:8000/delete-model \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model_id\": \"runwayml/stable-diffusion-v1-5\", \"type\" : \"all\"}'\n```\n\n**GET /ping** — basic liveness check\n\n**GET /health** — server configuration, system info, warnings and errors\n\n## Run server and call compile endpoint (example):\n\n```bash\n# start server\npython3 server.py --host 127.0.0.1 --port 8000 --debug \u0026\n\n# request compilation\ncurl -X POST http://127.0.0.1:8000/compile \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model_id\": \"runwayml/stable-diffusion-v1-5\", \"compilation_mode\": \"fast\"}'\n\n# stop server\npkill -f server.py\n```\n\n\n# Single Files Explanation\n\n## 1. Use memory-aware compilation\n```bash\npython3 compile_with_memory_mgmt.py --model-id MODEL_ID --mode fast\n```\n\n## 2. Restart with clean memory\n```bash\n./restart_clean_compile.sh MODEL_ID fast\n```\n\n## 3. Set recommended env vars\n```bash\nexport PYTORCH_CUDA_ALLOC_CONF=\"expandable_segments:True,max_split_size_mb:512\"\nexport CUDA_VISIBLE_DEVICES=0\npython3 download_model_and_compile.py --device cuda --model-id MODEL_ID\n```\n\n# Docker usage\n\n### Build examples\n```bash\ndocker build -t docker-pruna .\ndocker build --build-arg COMPILATION_MODE=fast -t docker-pruna .\ndocker build \\\n  --build-arg MODEL_DIFF=\"runwayml/stable-diffusion-v1-5\" \\\n  --build-arg COMPILATION_MODE=moderate \\\n  -t docker-pruna .\n```\n\n### Test and validation\n```bash\npython3 tests/test_pruna_infer.py\n./test_main.sh\n```\n\n\n## Quick CLI examples\n\n1) Download a model (CLI):\n\n```bash\npython3 utilities/download_model_and_compile.py --model-id runwayml/stable-diffusion-v1-5\n```\n\n2) Download + compile (fast mode):\n\n```bash\npython3 download_model_and_compile.py \\\n  --model-id runwayml/stable-diffusion-v1-5 \\\n  --compilation-mode fast\n```\n\n3) Run the Flask API locally and test compile endpoint (example):\n\n```bash\n# start server in background\npython3 server.py --host 127.0.0.1 --port 8000 --debug \u0026\n\n# request compilation (replace host/port if needed)\ncurl -X POST http://127.0.0.1:8000/compile \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model_id\": \"runwayml/stable-diffusion-v1-5\", \"compilation_mode\": \"fast\"}'\n\n# stop server\npkill -f server.py\n```\n\n5) Generate images via API:\n\n```bash\ncurl -X POST http://127.0.0.1:8000/generate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model_id\": \"runwayml/stable-diffusion-v1-5\", \"prompt\": \"A scenic landscape at sunset\"}'\n```\n\n\n# Practical tips\n- Use `--skip-download` or `--skip-compile` when you want only one step.\n- Prefer `fast` for quick iterations and `moderate/normal` for production-quality results.\n- On Apple Silicon prefer `--device mps` and `fast` to avoid Pruna incompatibilities.\n- If you see `CUDA out of memory` during compilation, use `restart_clean_compile.sh` or `compile_with_memory_mgmt.py`.\n\n\n# Troubleshooting \u0026 common issues\n| Issue | Affected models | Automatic fix |\n|------|------------------|---------------|\n| \"Model is not compatible with fora\" | SD 1.5, SD 1.4 | Switch to DeepCache instead of FORA |\n| \"deepcache is not compatible with device mps\" | All on MPS | Disable DeepCache on MPS |\n| Missing optional deps on MPS | HQQ \u0026 others | Disable HQQ on MPS |\n| Missing packages | Various optimizations | Fallback to safe minimal config |\n\n\n## Problem: \"CUDA out of memory\" during compilation\n```\nCause: GPU memory is already occupied by other processes.\n```\n**Automatic fixes:**\n\n```bash\n./restart_clean_compile.sh runwayml/stable-diffusion-v1-5 fast\n\npython3 compile_with_memory_mgmt.py --model-id MODEL_ID --mode fast\n\npython3 download_model_and_compile.py \\\n  --model-id MODEL_ID \\\n  --compilation-mode fast \\\n  --device cuda\n```\n\n#### Diagnostics\n```bash\npython3 test_pruna_cuda.py\n\n# Example output:\n# ✅ CUDA available: True\n# ✅ GPU: NVIDIA GeForce RTX 4090\n# ✅ Total GPU memory: 24.0 GB\n# ❌ Pruna CUDA: configuration error\n# 💡 Recommendation: reinstall Pruna\n```\n\n\n# System requirements\n\nMinimum:\n- Python 3.8+\n- 4 GB RAM\n- 10 GB disk\n\nRecommended:\n- CUDA 12.1+ and compatible NVIDIA driver for GPU workflows\n- 16 GB+ RAM for larger models\n- Apple Silicon (M1/M2/M3) supported with device-specific fallbacks\n\n## Credits\n\n- Project maintainer: repository owner\n- Libraries and tools: Pruna (smash), Hugging Face `diffusers`, `huggingface_hub`, PyTorch, Flask\n\n## 🤝 Contributing\n\nContributions are welcome! Please fork, branch, and submit a pull request:\n\n1. Fork the repo\n2. Create a feature branch\n3. Commit your changes\n4. Open a Pull Request\n\n## 📄 License\n\nThis project is Apache 2.0 licensed. See [LICENSE](LICENSE) for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexgenovese%2Fdocker-pruna","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexgenovese%2Fdocker-pruna","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexgenovese%2Fdocker-pruna/lists"}