https://github.com/artryazanov/embedding-service

This is a FastAPI-based service for generating text embeddings, supporting multiple architectures like intfloat/multilingual-e5-large and BAAI/bge-m3. It automatically configures prefixes and sequence lengths based on the selected model. It supports both single text and batch processing.
https://github.com/artryazanov/embedding-service

ai-assisted bge-m3 docker e5 embeddings fastapi fine-tuning huggingface machine-learning multilingual nlp python pytorch rest-api semantic-search sentence-transformers

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/artryazanov/embedding-service
Owner: artryazanov
License: unlicense
Created: 2025-12-11T15:30:38.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-02-10T11:36:10.000Z (5 months ago)
Last Synced: 2026-02-10T15:38:52.798Z (5 months ago)
Topics: ai-assisted, bge-m3, docker, e5, embeddings, fastapi, fine-tuning, huggingface, machine-learning, multilingual, nlp, python, pytorch, rest-api, semantic-search, sentence-transformers
Language: Python
Homepage:
Size: 45.9 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # BGE-M3 Embedding Service

This is a high-performance, FastAPI-based microservice and WebSocket worker dedicated to generating text embeddings using the state-of-the-art **`BAAI/bge-m3`** model. Designed for international scalability, the architecture features a strictly validated configuration system, an intelligent exponential backoff WebSocket client for external integrations, and seamless CPU/GPU Docker deployments.

[![Tests](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml/badge.svg)](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml)

[![codecov](https://codecov.io/gh/artryazanov/embedding-service/graph/badge.svg)](https://codecov.io/gh/artryazanov/embedding-service)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

![Python Versions](https://img.shields.io/badge/python-3.12-blue)

![FastAPI](https://img.shields.io/badge/FastAPI-005571?style=flat&logo=fastapi)

![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=flat&logo=docker&logoColor=white)

![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=flat&logo=PyTorch&logoColor=white)

![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-yellow)

## 🔥 Core Features

- **Pydantic Driven**: Centralized and type-safe `.env` parsing via `pydantic-settings`.

- **Dedicated Engine**: Refactored OOP `EmbeddingEngine` tailored specifically for extracting embeddings safely and closing memory leaks reliably.

- **Robust WebSocket Worker**: A resilient background task connecting to Reverb (`pusher_websocket`) possessing an exponential backoff retry mechanism to guarantee persistent connections under network stress.

- **FastAPI Core**: A high-performance REST API managed by advanced application `lifespan` generators.

- **Smart Hardware Detection**: Automatically targets `cuda` if available and safely falls back to `cpu`. 

- **Modular Dockerfile**: A single Dockerfile handles both CPU and GPU builds natively via `ARG DEVICE`.

---

## 🛠 Configuration (`.env`)

To start, copy the example configuration.

```bash

cp .env.example .env

```

| Variable | Description | Default |

| :--- | :--- | :--- |

| `API_TOKEN` | Optional Bearer token for secure REST endpoints. | `None` |

| `MODEL_NAME` | The HuggingFace model path or local repository name. | `BAAI/bge-m3` |

| `MAX_SEQ_LENGTH` | Maximum tokens per sequence. | `8192` |

| `DEVICE` | Target hardware. (`auto`, `cpu`, or `cuda`) | `auto` |

| `REVERB_APP_KEY` | Reverb integration key for the WebSocket worker. | `None` |

| `REVERB_HOST` | Host address of the Reverb instance. | `reverb` |

| `REVERB_PORT` | Port of the Reverb instance. | `8080` |

| `REVERB_SCHEME` | WebSocket connection layer (`http` maps to `ws`, `https` mappings to `wss`). | `http` |

---

## 🚀 Running the Service (Docker)

### 1️⃣ Run on GPU (Recommended)

This requires the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

**Build the image:**

```bash

# DEVICE=gpu is the default argument

docker build -t embedding-service:gpu .

```

**Launch the container:**

```bash

docker run -d -p 8000:8000 --gpus all \

  --env-file .env \

  -v $(pwd)/models:/app/models \

  --name embedding-service embedding-service:gpu

```

### 2️⃣ Run on CPU (Space & Compute Optimization)

If running on a standard server without GPU access, you can build a severely optimized environment relying on PyTorch's `cpu` wheels to drastically lower image weight.

**Build the optimized image:**

```bash

docker build --build-arg DEVICE=cpu -t embedding-service:cpu .

```

**Launch the container:**

```bash

docker run -d -p 8000:8000 \

  --env-file .env \

  -v $(pwd)/models:/app/models \

  --name embedding-service-cpu embedding-service:cpu

```

---

## 📚 REST API Usage

### Health & Capabilities (`GET /health`)

Check service availability, loaded model identity, and active hardware device.

```bash

curl -X GET "http://localhost:8000/health" \

     -H "Authorization: Bearer "

```

### Generate Single Embedding (`POST /vectorize`)

Extract a base embedding array for a single query or document.

```bash

curl -X POST "http://localhost:8000/vectorize" \

     -H "Content-Type: application/json" \

     -d '{"text": "Artificial Intelligence is evolving rapidly."}'

```

**Response:**

```json

{

  "vector": [0.0123, -0.0456, 0.0789, ...]

}

```

### Generate Batch Embeddings (`POST /vectorize-batch`)

Compute multiple vectors highly optimally in a single pass. (Batch size explicitly chunked internally).

```bash

curl -X POST "http://localhost:8000/vectorize-batch" \

     -H "Content-Type: application/json" \

     -d '{"items": ["First document segment.", "Second document segment."]}'

```

**Response:**

```json

{

  "vectors": [

    [0.0123, ...],

    [-0.0789, ...]

  ]

}

```

---

## 🧪 Development & Testing

This project adheres explicitly to **Senior Python Developer Guidelines** featuring `pytest`, mock patching, `pytest-cov`, and `pytest-asyncio` strictly executing in a sandboxed `venv`.

1. **Activate Environment and Install dependencies:**

```bash

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

pip install -r requirements-dev.txt

```

2. **Run the complete testing suite (Target: 90%+ Coverage):**

```bash

pytest tests/ -v -p no:warnings --cov=.

```

## License

This project is licensed under the [MIT License](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/artryazanov/embedding-service

Awesome Lists containing this project

README