https://github.com/hec-ovi/openclaw-strix-embed

OpenAI-compatible /v1/embeddings server (BAAI/bge-m3, 1024 dims, 100+ langs) on AMD Strix Halo via ROCm. Drop-in replacement for OpenAI text-embedding-3, Docker, no API keys, ~47ms single-text latency.
https://github.com/hec-ovi/openclaw-strix-embed

amd bge-m3 docker embedding-model embeddings fastapi gfx1151 openai-api openai-compatible rocm self-hosted sentence-transformers strix-halo vector-search

Last synced: 8 days ago
JSON representation

Host: GitHub
URL: https://github.com/hec-ovi/openclaw-strix-embed
Owner: hec-ovi
License: other
Created: 2026-02-18T17:50:34.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-04-26T16:07:14.000Z (2 months ago)
Last Synced: 2026-04-26T18:10:52.694Z (2 months ago)
Topics: amd, bge-m3, docker, embedding-model, embeddings, fastapi, gfx1151, openai-api, openai-compatible, rocm, self-hosted, sentence-transformers, strix-halo, vector-search
Language: Python
Size: 10.7 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          
openclaw-strix-embed




  Local, GPU-accelerated, OpenAI-compatible /v1/embeddings API on AMD Strix Halo. BAAI/bge-m3 by default, no API keys, no fees, no data leaving your network.





  

  

  

  

  

  



---

## What this is

Local, GPU-accelerated, OpenAI-compatible embeddings API. Drop-in replacement for OpenAI's `/v1/embeddings` endpoint, no API keys, no usage fees, no data leaving your network.

Built for AMD Strix Halo (RDNA 3.5 / gfx1151) with ROCm, but falls back to CPU if no GPU is available.

## Why

Every vector database and RAG pipeline needs an embeddings API. The standard options, OpenAI `text-embedding-3-small`, Google `text-embedding-004`, charge per token and send your data to external servers. This runs the same API contract locally, for free, on your own hardware.

## Project Structure

```

.

├── .gitignore              # Ignores .env and data/

├── .env.template           # Template, copy to .env

├── README.md               # This file

├── llm.txt                 # Complete technical reference

├── Dockerfile              # Ubuntu Rolling + ROCm PyTorch + FastAPI

├── docker-compose.yml      # Service definition with GPU passthrough

├── entrypoint.sh           # GPU check + uvicorn start

├── server.py               # OpenAI-compatible FastAPI server

└── data/                   # Persistent model cache (git-ignored)

    └── models/             # HuggingFace model files

```

## Prerequisites

- Docker with compose plugin

- AMD Strix Halo (or any RDNA 3.5 GPU) for GPU mode

- ~7 GB disk for the model + Docker image

## Quick Start

```bash

cp .env.template .env

# Edit .env if you want to change the model or port

docker compose up -d --build

```

First start downloads the model (~4.3 GB). Subsequent starts load from cache in seconds.

**Verify:**

```bash

# Check GPU detection

docker logs openclaw-embeddings

# Test the API

curl http://localhost:8484/v1/embeddings \

  -H "Content-Type: application/json" \

  -d '{"model":"BAAI/bge-m3","input":"Hello world"}'

```

## API

### POST /v1/embeddings

OpenAI-compatible. Works with any client that speaks the OpenAI embeddings format.

**Request:**

```json

{

  "model": "BAAI/bge-m3",

  "input": "text to embed"

}

```

`input` accepts a single string or an array of strings for batch embedding.

**Response:**

```json

{

  "object": "list",

  "data": [

    {

      "object": "embedding",

      "index": 0,

      "embedding": [0.0123, -0.044, ...]

    }

  ],

  "model": "BAAI/bge-m3",

  "usage": {"prompt_tokens": 3, "total_tokens": 3}

}

```

### GET /v1/models

Lists available models and their dimensions.

### GET /health

Returns `{"status": "ok", "model_loaded": true}` when ready.

## Configuration

All configurable via `.env`:

| Variable | Default | Description |

|----------|---------|-------------|

| `EMBEDDING_MODEL` | `BAAI/bge-m3` | HuggingFace model ID |

| `EMBEDDING_PORT` | `8484` | Host port for the API |

## Model

Default: [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)

| Property | Value |

|----------|-------|

| Dimensions | 1024 |

| Languages | 100+ (excellent EN + ES) |

| Max tokens | 8192 |

| Size | ~2.2 GB (weights) |

You can swap it for any [sentence-transformers](https://www.sbert.net/) compatible model by changing `EMBEDDING_MODEL` in `.env`.

## Using with Multipass VMs

If OpenClaw runs inside a Multipass VM and this embeddings service runs on the host, `localhost` won't work, it points to the VM, not the host.

**Find the host IP on the Multipass bridge:**

```bash

# On the host

ip addr show mpqemubr0 | grep 'inet '

# Example output: inet 10.5.162.1/24 ...

```

**Test from inside the VM:**

```bash

multipass exec  -- curl http://10.5.162.1:8484/v1/embeddings \

  -H "Content-Type: application/json" \

  -d '{"model":"BAAI/bge-m3","input":"connectivity test"}'

```

**Configure OpenClaw with:**

| Setting | Value |

|---------|-------|

| Base URL | `http://:8484/v1` |

| Model | `BAAI/bge-m3` |

| Auth | None |

## Performance

Tested on AMD Ryzen AI Max (Strix Halo) with Radeon 8060S iGPU:

| Metric | Value |

|--------|-------|

| Latency (single text) | ~47ms |

| Latency (first request, cold) | ~270ms |

| GPU VRAM used | ~1.5 GB |

| Model load time (from cache) | ~10s |

## GPU Details

This container uses the same ROCm setup from [rocm-strix-docker](https://github.com/hec-ovi/rocm-strix-docker):

- **`HSA_OVERRIDE_GFX_VERSION=11.5.1`**, required for ROCm to recognize Strix Halo

- **`privileged: true`**, grants `/dev/kfd` and `/dev/dri` access for GPU compute

- **`ipc: host`**, shared memory for PyTorch

- **PyTorch wheels** from `https://rocm.prereleases.amd.com/whl/gfx1151/` (ROCm 7.11 prerelease)

- **UV** manages Python 3.12 + all packages (no pip)

---

## License

[MIT](LICENSE) for original code in this repository (FastAPI server, Dockerfile, Compose configs, scripts). Third-party model weights (BAAI/bge-m3) and runtimes (sentence-transformers, transformers, PyTorch ROCm) retain their own upstream licenses; this repository does not redistribute them.

## Verified Output

```

[embeddings] ========================================

[embeddings] Model: BAAI/bge-m3

[embeddings] ========================================

[embeddings] Checking GPU...

[embeddings] GPU: Radeon 8060S Graphics

[embeddings] VRAM: 124.6 GB

[embeddings] ROCm/HIP: 7.2.53150-7b886380f9

[embeddings] Device: cuda

[embeddings] Starting server on port 80...

[embeddings] Loading BAAI/bge-m3 on cuda

[embeddings] GPU: Radeon 8060S Graphics

[embeddings] Model loaded. Dimension: 1024

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hec-ovi/openclaw-strix-embed

Awesome Lists containing this project

README

openclaw-strix-embed