{"id":28927085,"url":"https://github.com/slinusc/deepspeed-mii-container","last_synced_at":"2026-04-28T00:31:43.189Z","repository":{"id":296529780,"uuid":"993688009","full_name":"slinusc/deepspeed-mii-container","owner":"slinusc","description":"Launch your own high-performance DeepSpeed-MII server for seamless local LLM deployment. This repository provides a Dockerized solution to serve Hugging Face models (e.g., Mistral-7B) with an OpenAI-compatible API, enabling GPU-accelerated, low-latency inference out of the box.","archived":false,"fork":false,"pushed_at":"2025-06-09T13:54:51.000Z","size":19,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-22T12:37:36.523Z","etag":null,"topics":["container","deepspeed","docker","engine","inference","llm","mii"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/r/slinusc/deepspeed-mii","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slinusc.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-31T09:55:42.000Z","updated_at":"2025-06-05T12:12:10.000Z","dependencies_parsed_at":"2025-05-31T22:49:00.548Z","dependency_job_id":"ab76af43-2351-4afd-8e53-6c1302cccedf","html_url":"https://github.com/slinusc/deepspeed-mii-container","commit_stats":null,"previous_names":["slinusc/deepspeed-mii-container"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/slinusc/deepspeed-mii-container","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slinusc%2Fdeepspeed-mii-container","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slinusc%2Fdeepspeed-mii-container/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slinusc%2Fdeepspeed-mii-container/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slinusc%2Fdeepspeed-mii-container/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slinusc","download_url":"https://codeload.github.com/slinusc/deepspeed-mii-container/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slinusc%2Fdeepspeed-mii-container/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32361477,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"ssl_error","status_checked_at":"2026-04-27T20:07:00.910Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["container","deepspeed","docker","engine","inference","llm","mii"],"created_at":"2025-06-22T12:30:54.837Z","updated_at":"2026-04-28T00:31:43.184Z","avatar_url":"https://github.com/slinusc.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepSpeed-MII OpenAI-Compatible Docker Container\n\nThis repository provides a **Dockerfile** and instructions to **pull a prebuilt image** from Docker Hub to run a Deepspeed-MII server to serve your locally deployed LLM. The container runs [DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII) with an OpenAI-API-compatible endpoint. Launching for the first time can take around 5 minutes. Once running, you can load any Hugging Face model (e.g., `mistralai/Mistral-7B-Instruct-v0.3`) or simply use the preloaded image on Docker Hub to interact via `/v1/chat/completions` just like you would with the official OpenAI API.\n\n---\n\n## Table of Contents\n\n1. [Prerequisites](#prerequisites)\n2. [Repository Structure](#repository-structure)\n3. [Option A: Pull the Prebuilt Image](#option-a-pull-the-prebuilt-image)\n4. [Option B: Build from Source](#option-b-build-from-source)\n5. [Running the Container](#running-the-container)\n6. [Testing with Linux CLI (](#testing-with-linux-cli-curl)[`curl`](#testing-with-linux-cli-curl)[)](#testing-with-linux-cli-curl)\n7. [Testing with Python + OpenAI SDK](#testing-with-python--openai-sdk)\n8. [Environment Variables](#environment-variables)\n9. [Customizing \u0026 Troubleshooting](#customizing--troubleshooting)\n10. [License](#license)\n\n---\n\n## Prerequisites\n\n* **Docker** (20.10+).\n* **NVIDIA Container Toolkit** (to allow `--gpus all`).\n* A valid **Hugging Face Hub token** if you plan to pull private or gated models:\n\n  ```bash\n  export HF_TOKEN=\u003cyour_hf_token\u003e\n  ```\n* (Optional) **OpenAI Python SDK** for testing in Python:\n\n  ```bash\n  pip install openai\n  ```\n\n---\n\n## Repository Structure\n\n```\ndeepSpeed-mii-container/\n├── Dockerfile\n├── readme.md\n└── .gitignore\n```\n\n* **`Dockerfile`**: Builds a CUDA-enabled image with DeepSpeed-MII, Pydantic v2, `pydantic-settings`, `sentencepiece`, FastAPI, Uvicorn, ShortUUID, and FastChat. Exposes port 23333 by default.\n* **`readme.md`**: This file—contains instructions for pulling or building, running, and testing.\n* **`.gitignore`**: Ignores local artifacts like `__pycache__` and logs.\n\n---\n\n## Option A: Pull the Prebuilt Image\n\nIf you want to skip building locally, simply pull the prebuilt image from Docker Hub:\n\n```bash\n# Pull the image (tagged \"latest\")\ndocker pull slinusc/deepspeed-mii:latest\n```\n\nNow you can jump to [Running the Container](#running-the-container) below, using `slinusc/deepspeed-mii:latest` as the image name.\n\n---\n\n## Running the Container\n\nWhether you pulled the prebuilt image or built locally, run the container with GPU support, mounting your Hugging Face cache, and exposing port 23333:\n\n```bash\n# Using the prebuilt Docker Hub image:\ndocker run --runtime=nvidia --gpus all \\\n  -v $HOME/.cache/huggingface:/root/.cache/huggingface \\\n  -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN \\\n  -p 127.0.0.1:23333:23333 \\\n  --ipc=host \\\n  slinusc/deepspeed-mii:latest \\\n  --model mistralai/Mistral-7B-Instruct-v0.3 \\\n  --port 23333\n```\n\n* `--runtime=nvidia --gpus all`: Access all GPUs.\n* `-v $HOME/.cache/huggingface:/root/.cache/huggingface`: Mount HF cache so weights aren’t re-downloaded.\n* `-e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN`: Pass HF token into container.\n* `-p 127.0.0.1:23333:23333`: Map container port 23333 → host 23333.\n* `--ipc=host`: Share IPC namespace to reduce overhead.\n* `--model mistralai/Mistral-7B-Instruct-v0.3`: Hugging Face path of the model to load.\n* `--port 23333`: Force Uvicorn to bind inside container on port 23333.\n\nYou should see logs like:\n\n```\nINFO:     Started server process [1]\nINFO:     Waiting for application startup.\nINFO:     Application startup complete.\nINFO:     Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit)\n```\n\nAt that point, the server is live at `http://127.0.0.1:23333/v1/...`.\n\n---\n\n## Testing with Linux CLI (`curl`)\n\nOnce the container is running, open a new terminal and run:\n\n### 1. Check available models\n\n```bash\ncurl http://127.0.0.1:23333/v1/models\n```\n\nExpected JSON:\n\n```json\n{\n  \"object\": \"list\",\n  \"data\": [\n    {\n      \"id\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n      \"object\": \"model\",\n      \"created\": 1748684820,\n      \"owned_by\": \"deepspeed-mii\",\n      \"root\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n      \"parent\": null,\n      \"permission\": [ … ]\n    }\n  ]\n}\n```\n\nThe key field is `\"id\"`, which you must use for subsequent requests.\n\n### 2. Send a chat completion request\n\n```bash\ncurl http://127.0.0.1:23333/v1/chat/completions \\\n  -X POST \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n        \"model\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n        \"messages\": [\n          { \"role\": \"system\", \"content\": \"You are a helpful assistant.\" },\n          { \"role\": \"user\",   \"content\": \"Tell me a fun fact about penguins.\" }\n        ],\n        \"max_tokens\": 32,\n        \"temperature\": 0.7\n      }'\n```\n\n### 3. Send a text completion request (optional)\n\n```bash\ncurl http://127.0.0.1:23333/v1/completions \\\n  -X POST \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n        \"model\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n        \"prompt\": \"Once upon a time in a distant galaxy,\",\n        \"max_tokens\": 50,\n        \"temperature\": 0.7\n      }'\n```\n\nYou’ll receive a JSON with a `choices` array containing the generated completion.\n\n---\n\n## Testing with Python + OpenAI SDK\n\nInstall the OpenAI SDK locally (if you haven’t already):\n\n```bash\npip install openai\n```\n\nSave the following as `test_mii.py`:\n\n```python\nfrom openai import OpenAI\n\n# Point to local MII endpoint:\nclient = OpenAI(\n    api_key=\"\",  # no key needed if container does not require auth\n    base_url=\"http://127.0.0.1:23333/v1\"\n)\n\nresponse = client.chat.completions.create(\n    model=\"mistralai/Mistral-7B-Instruct-v0.3\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\",   \"content\": \"How many wings does a penguin have?\"}\n    ],\n    max_tokens=16,\n    temperature=0.7\n)\nprint(response.choices[0].message.content)\n```\n\nRun:\n\n```bash\npython3 test_mii.py\n```\n\nYou should see a short answer about penguins printed. If it errors, ensure:\n\n1. The container is still running.\n2. The correct `model` ID is used.\n3. Port 23333 is mapped.\n\n---\n\n## Environment Variables\n\n* **`HF_TOKEN`** (or `HUGGING_FACE_HUB_TOKEN` inside the container)\n\n  * Your Hugging Face Hub token for private gated models.\n\n  ```bash\n  export HF_TOKEN=\u003cyour_hf_token\u003e\n  ```\n\n* **`OPENAI_API_KEY`** (optional)\n\n  * If you configured the container to require an API key, set this on your host and pass it with `-e OPENAI_API_KEY` in `docker run`. Otherwise, the container defaults to no-auth mode.\n\n---\n\n## Customizing \u0026 Troubleshooting\n\n### Change the CUDA base image\n\nIn `Dockerfile`, you can swap:\n\n```dockerfile\nFROM nvidia/cuda:12.2.2-devel-ubuntu20.04\n```\n\nfor another tag such as `cuda:11.8-devel-ubuntu20.04` to match your GPU driver.\n\n### Use a different Hugging Face model\n\nChange the `--model` argument when running:\n\n```bash\n--model your-org/your-model-name\n```\n\nIf you want to load a quantized checkpoint, append `--quantize gptq` or similar.\n\n## License\n\nThis repository is licensed under the **MIT License**. See [LICENSE](LICENSE) for details. If you omit a LICENSE file, it defaults to “All rights reserved.”\n\n---\n\n**Congratulations!** You have a fully functional Docker container that runs DeepSpeed-MII in OpenAI-API compatibility mode. Anyone can now either pull the prebuilt image (`slinusc/deepspeed-mii:latest`) or build from source and run a local inference server for Mistral or any other Hugging Face model.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslinusc%2Fdeepspeed-mii-container","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslinusc%2Fdeepspeed-mii-container","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslinusc%2Fdeepspeed-mii-container/lists"}