{"id":51011762,"url":"https://github.com/hec-ovi/openclaw-strix-embed","last_synced_at":"2026-06-21T03:31:22.507Z","repository":{"id":353999608,"uuid":"1161075051","full_name":"hec-ovi/openclaw-strix-embed","owner":"hec-ovi","description":"OpenAI-compatible /v1/embeddings server (BAAI/bge-m3, 1024 dims, 100+ langs) on AMD Strix Halo via ROCm. Drop-in replacement for OpenAI text-embedding-3, Docker, no API keys, ~47ms single-text latency.","archived":false,"fork":false,"pushed_at":"2026-04-26T16:07:14.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-26T18:10:52.694Z","etag":null,"topics":["amd","bge-m3","docker","embedding-model","embeddings","fastapi","gfx1151","openai-api","openai-compatible","rocm","self-hosted","sentence-transformers","strix-halo","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hec-ovi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-18T17:50:34.000Z","updated_at":"2026-04-26T16:07:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hec-ovi/openclaw-strix-embed","commit_stats":null,"previous_names":["hec-ovi/openclaw-strix-embed"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/hec-ovi/openclaw-strix-embed","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hec-ovi%2Fopenclaw-strix-embed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hec-ovi%2Fopenclaw-strix-embed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hec-ovi%2Fopenclaw-strix-embed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hec-ovi%2Fopenclaw-strix-embed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hec-ovi","download_url":"https://codeload.github.com/hec-ovi/openclaw-strix-embed/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hec-ovi%2Fopenclaw-strix-embed/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34593128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amd","bge-m3","docker","embedding-model","embeddings","fastapi","gfx1151","openai-api","openai-compatible","rocm","self-hosted","sentence-transformers","strix-halo","vector-search"],"created_at":"2026-06-21T03:31:21.808Z","updated_at":"2026-06-21T03:31:22.493Z","avatar_url":"https://github.com/hec-ovi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eopenclaw-strix-embed\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eLocal, GPU-accelerated, OpenAI-compatible \u003ccode\u003e/v1/embeddings\u003c/code\u003e API on AMD Strix Halo. BAAI/bge-m3 by default, no API keys, no fees, no data leaving your network.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Status-Working-brightgreen\" alt=\"Status\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/AMD-Strix_Halo-ED1C24?logo=amd\u0026logoColor=white\" alt=\"AMD Strix Halo\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/ROCm-7.x-EF5B25?logo=amd\u0026logoColor=white\" alt=\"ROCm\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Model-BAAI/bge--m3-7B3FA0\" alt=\"Model\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/FastAPI-009688?logo=fastapi\u0026logoColor=white\" alt=\"FastAPI\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/License-MIT-blue\" alt=\"License\" /\u003e\n\u003c/p\u003e\n\n---\n\n## What this is\n\nLocal, GPU-accelerated, OpenAI-compatible embeddings API. Drop-in replacement for OpenAI's `/v1/embeddings` endpoint, no API keys, no usage fees, no data leaving your network.\n\nBuilt for AMD Strix Halo (RDNA 3.5 / gfx1151) with ROCm, but falls back to CPU if no GPU is available.\n\n## Why\n\nEvery vector database and RAG pipeline needs an embeddings API. The standard options, OpenAI `text-embedding-3-small`, Google `text-embedding-004`, charge per token and send your data to external servers. This runs the same API contract locally, for free, on your own hardware.\n\n## Project Structure\n\n```\n.\n├── .gitignore              # Ignores .env and data/\n├── .env.template           # Template, copy to .env\n├── README.md               # This file\n├── llm.txt                 # Complete technical reference\n├── Dockerfile              # Ubuntu Rolling + ROCm PyTorch + FastAPI\n├── docker-compose.yml      # Service definition with GPU passthrough\n├── entrypoint.sh           # GPU check + uvicorn start\n├── server.py               # OpenAI-compatible FastAPI server\n└── data/                   # Persistent model cache (git-ignored)\n    └── models/             # HuggingFace model files\n```\n\n## Prerequisites\n\n- Docker with compose plugin\n- AMD Strix Halo (or any RDNA 3.5 GPU) for GPU mode\n- ~7 GB disk for the model + Docker image\n\n## Quick Start\n\n```bash\ncp .env.template .env\n# Edit .env if you want to change the model or port\ndocker compose up -d --build\n```\n\nFirst start downloads the model (~4.3 GB). Subsequent starts load from cache in seconds.\n\n**Verify:**\n```bash\n# Check GPU detection\ndocker logs openclaw-embeddings\n\n# Test the API\ncurl http://localhost:8484/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"BAAI/bge-m3\",\"input\":\"Hello world\"}'\n```\n\n## API\n\n### POST /v1/embeddings\n\nOpenAI-compatible. Works with any client that speaks the OpenAI embeddings format.\n\n**Request:**\n```json\n{\n  \"model\": \"BAAI/bge-m3\",\n  \"input\": \"text to embed\"\n}\n```\n\n`input` accepts a single string or an array of strings for batch embedding.\n\n**Response:**\n```json\n{\n  \"object\": \"list\",\n  \"data\": [\n    {\n      \"object\": \"embedding\",\n      \"index\": 0,\n      \"embedding\": [0.0123, -0.044, ...]\n    }\n  ],\n  \"model\": \"BAAI/bge-m3\",\n  \"usage\": {\"prompt_tokens\": 3, \"total_tokens\": 3}\n}\n```\n\n### GET /v1/models\n\nLists available models and their dimensions.\n\n### GET /health\n\nReturns `{\"status\": \"ok\", \"model_loaded\": true}` when ready.\n\n## Configuration\n\nAll configurable via `.env`:\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `EMBEDDING_MODEL` | `BAAI/bge-m3` | HuggingFace model ID |\n| `EMBEDDING_PORT` | `8484` | Host port for the API |\n\n## Model\n\nDefault: [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)\n\n| Property | Value |\n|----------|-------|\n| Dimensions | 1024 |\n| Languages | 100+ (excellent EN + ES) |\n| Max tokens | 8192 |\n| Size | ~2.2 GB (weights) |\n\nYou can swap it for any [sentence-transformers](https://www.sbert.net/) compatible model by changing `EMBEDDING_MODEL` in `.env`.\n\n## Using with Multipass VMs\n\nIf OpenClaw runs inside a Multipass VM and this embeddings service runs on the host, `localhost` won't work, it points to the VM, not the host.\n\n**Find the host IP on the Multipass bridge:**\n```bash\n# On the host\nip addr show mpqemubr0 | grep 'inet '\n# Example output: inet 10.5.162.1/24 ...\n```\n\n**Test from inside the VM:**\n```bash\nmultipass exec \u003cvm-name\u003e -- curl http://10.5.162.1:8484/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"BAAI/bge-m3\",\"input\":\"connectivity test\"}'\n```\n\n**Configure OpenClaw with:**\n\n| Setting | Value |\n|---------|-------|\n| Base URL | `http://\u003chost-bridge-ip\u003e:8484/v1` |\n| Model | `BAAI/bge-m3` |\n| Auth | None |\n\n## Performance\n\nTested on AMD Ryzen AI Max (Strix Halo) with Radeon 8060S iGPU:\n\n| Metric | Value |\n|--------|-------|\n| Latency (single text) | ~47ms |\n| Latency (first request, cold) | ~270ms |\n| GPU VRAM used | ~1.5 GB |\n| Model load time (from cache) | ~10s |\n\n## GPU Details\n\nThis container uses the same ROCm setup from [rocm-strix-docker](https://github.com/hec-ovi/rocm-strix-docker):\n\n- **`HSA_OVERRIDE_GFX_VERSION=11.5.1`**, required for ROCm to recognize Strix Halo\n- **`privileged: true`**, grants `/dev/kfd` and `/dev/dri` access for GPU compute\n- **`ipc: host`**, shared memory for PyTorch\n- **PyTorch wheels** from `https://rocm.prereleases.amd.com/whl/gfx1151/` (ROCm 7.11 prerelease)\n- **UV** manages Python 3.12 + all packages (no pip)\n\n---\n\n## License\n\n[MIT](LICENSE) for original code in this repository (FastAPI server, Dockerfile, Compose configs, scripts). Third-party model weights (BAAI/bge-m3) and runtimes (sentence-transformers, transformers, PyTorch ROCm) retain their own upstream licenses; this repository does not redistribute them.\n\n## Verified Output\n\n```\n[embeddings] ========================================\n[embeddings] Model: BAAI/bge-m3\n[embeddings] ========================================\n[embeddings] Checking GPU...\n[embeddings] GPU: Radeon 8060S Graphics\n[embeddings] VRAM: 124.6 GB\n[embeddings] ROCm/HIP: 7.2.53150-7b886380f9\n[embeddings] Device: cuda\n[embeddings] Starting server on port 80...\n[embeddings] Loading BAAI/bge-m3 on cuda\n[embeddings] GPU: Radeon 8060S Graphics\n[embeddings] Model loaded. Dimension: 1024\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhec-ovi%2Fopenclaw-strix-embed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhec-ovi%2Fopenclaw-strix-embed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhec-ovi%2Fopenclaw-strix-embed/lists"}