https://github.com/hec-ovi/openclaw-strix-embed
OpenAI-compatible /v1/embeddings server (BAAI/bge-m3, 1024 dims, 100+ langs) on AMD Strix Halo via ROCm. Drop-in replacement for OpenAI text-embedding-3, Docker, no API keys, ~47ms single-text latency.
https://github.com/hec-ovi/openclaw-strix-embed
amd bge-m3 docker embedding-model embeddings fastapi gfx1151 openai-api openai-compatible rocm self-hosted sentence-transformers strix-halo vector-search
Last synced: 8 days ago
JSON representation
OpenAI-compatible /v1/embeddings server (BAAI/bge-m3, 1024 dims, 100+ langs) on AMD Strix Halo via ROCm. Drop-in replacement for OpenAI text-embedding-3, Docker, no API keys, ~47ms single-text latency.
- Host: GitHub
- URL: https://github.com/hec-ovi/openclaw-strix-embed
- Owner: hec-ovi
- License: other
- Created: 2026-02-18T17:50:34.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-04-26T16:07:14.000Z (2 months ago)
- Last Synced: 2026-04-26T18:10:52.694Z (2 months ago)
- Topics: amd, bge-m3, docker, embedding-model, embeddings, fastapi, gfx1151, openai-api, openai-compatible, rocm, self-hosted, sentence-transformers, strix-halo, vector-search
- Language: Python
- Size: 10.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
openclaw-strix-embed
Local, GPU-accelerated, OpenAI-compatible /v1/embeddings API on AMD Strix Halo. BAAI/bge-m3 by default, no API keys, no fees, no data leaving your network.
---
## What this is
Local, GPU-accelerated, OpenAI-compatible embeddings API. Drop-in replacement for OpenAI's `/v1/embeddings` endpoint, no API keys, no usage fees, no data leaving your network.
Built for AMD Strix Halo (RDNA 3.5 / gfx1151) with ROCm, but falls back to CPU if no GPU is available.
## Why
Every vector database and RAG pipeline needs an embeddings API. The standard options, OpenAI `text-embedding-3-small`, Google `text-embedding-004`, charge per token and send your data to external servers. This runs the same API contract locally, for free, on your own hardware.
## Project Structure
```
.
├── .gitignore # Ignores .env and data/
├── .env.template # Template, copy to .env
├── README.md # This file
├── llm.txt # Complete technical reference
├── Dockerfile # Ubuntu Rolling + ROCm PyTorch + FastAPI
├── docker-compose.yml # Service definition with GPU passthrough
├── entrypoint.sh # GPU check + uvicorn start
├── server.py # OpenAI-compatible FastAPI server
└── data/ # Persistent model cache (git-ignored)
└── models/ # HuggingFace model files
```
## Prerequisites
- Docker with compose plugin
- AMD Strix Halo (or any RDNA 3.5 GPU) for GPU mode
- ~7 GB disk for the model + Docker image
## Quick Start
```bash
cp .env.template .env
# Edit .env if you want to change the model or port
docker compose up -d --build
```
First start downloads the model (~4.3 GB). Subsequent starts load from cache in seconds.
**Verify:**
```bash
# Check GPU detection
docker logs openclaw-embeddings
# Test the API
curl http://localhost:8484/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"BAAI/bge-m3","input":"Hello world"}'
```
## API
### POST /v1/embeddings
OpenAI-compatible. Works with any client that speaks the OpenAI embeddings format.
**Request:**
```json
{
"model": "BAAI/bge-m3",
"input": "text to embed"
}
```
`input` accepts a single string or an array of strings for batch embedding.
**Response:**
```json
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.044, ...]
}
],
"model": "BAAI/bge-m3",
"usage": {"prompt_tokens": 3, "total_tokens": 3}
}
```
### GET /v1/models
Lists available models and their dimensions.
### GET /health
Returns `{"status": "ok", "model_loaded": true}` when ready.
## Configuration
All configurable via `.env`:
| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDDING_MODEL` | `BAAI/bge-m3` | HuggingFace model ID |
| `EMBEDDING_PORT` | `8484` | Host port for the API |
## Model
Default: [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)
| Property | Value |
|----------|-------|
| Dimensions | 1024 |
| Languages | 100+ (excellent EN + ES) |
| Max tokens | 8192 |
| Size | ~2.2 GB (weights) |
You can swap it for any [sentence-transformers](https://www.sbert.net/) compatible model by changing `EMBEDDING_MODEL` in `.env`.
## Using with Multipass VMs
If OpenClaw runs inside a Multipass VM and this embeddings service runs on the host, `localhost` won't work, it points to the VM, not the host.
**Find the host IP on the Multipass bridge:**
```bash
# On the host
ip addr show mpqemubr0 | grep 'inet '
# Example output: inet 10.5.162.1/24 ...
```
**Test from inside the VM:**
```bash
multipass exec -- curl http://10.5.162.1:8484/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"BAAI/bge-m3","input":"connectivity test"}'
```
**Configure OpenClaw with:**
| Setting | Value |
|---------|-------|
| Base URL | `http://:8484/v1` |
| Model | `BAAI/bge-m3` |
| Auth | None |
## Performance
Tested on AMD Ryzen AI Max (Strix Halo) with Radeon 8060S iGPU:
| Metric | Value |
|--------|-------|
| Latency (single text) | ~47ms |
| Latency (first request, cold) | ~270ms |
| GPU VRAM used | ~1.5 GB |
| Model load time (from cache) | ~10s |
## GPU Details
This container uses the same ROCm setup from [rocm-strix-docker](https://github.com/hec-ovi/rocm-strix-docker):
- **`HSA_OVERRIDE_GFX_VERSION=11.5.1`**, required for ROCm to recognize Strix Halo
- **`privileged: true`**, grants `/dev/kfd` and `/dev/dri` access for GPU compute
- **`ipc: host`**, shared memory for PyTorch
- **PyTorch wheels** from `https://rocm.prereleases.amd.com/whl/gfx1151/` (ROCm 7.11 prerelease)
- **UV** manages Python 3.12 + all packages (no pip)
---
## License
[MIT](LICENSE) for original code in this repository (FastAPI server, Dockerfile, Compose configs, scripts). Third-party model weights (BAAI/bge-m3) and runtimes (sentence-transformers, transformers, PyTorch ROCm) retain their own upstream licenses; this repository does not redistribute them.
## Verified Output
```
[embeddings] ========================================
[embeddings] Model: BAAI/bge-m3
[embeddings] ========================================
[embeddings] Checking GPU...
[embeddings] GPU: Radeon 8060S Graphics
[embeddings] VRAM: 124.6 GB
[embeddings] ROCm/HIP: 7.2.53150-7b886380f9
[embeddings] Device: cuda
[embeddings] Starting server on port 80...
[embeddings] Loading BAAI/bge-m3 on cuda
[embeddings] GPU: Radeon 8060S Graphics
[embeddings] Model loaded. Dimension: 1024
```