https://github.com/artryazanov/embedding-service
This is a FastAPI-based service for generating text embeddings, supporting multiple architectures like intfloat/multilingual-e5-large and BAAI/bge-m3. It automatically configures prefixes and sequence lengths based on the selected model. It supports both single text and batch processing.
https://github.com/artryazanov/embedding-service
ai-assisted bge-m3 docker e5 embeddings fastapi fine-tuning huggingface machine-learning multilingual nlp python pytorch rest-api semantic-search sentence-transformers
Last synced: 3 months ago
JSON representation
This is a FastAPI-based service for generating text embeddings, supporting multiple architectures like intfloat/multilingual-e5-large and BAAI/bge-m3. It automatically configures prefixes and sequence lengths based on the selected model. It supports both single text and batch processing.
- Host: GitHub
- URL: https://github.com/artryazanov/embedding-service
- Owner: artryazanov
- License: unlicense
- Created: 2025-12-11T15:30:38.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-02-10T11:36:10.000Z (4 months ago)
- Last Synced: 2026-02-10T15:38:52.798Z (4 months ago)
- Topics: ai-assisted, bge-m3, docker, e5, embeddings, fastapi, fine-tuning, huggingface, machine-learning, multilingual, nlp, python, pytorch, rest-api, semantic-search, sentence-transformers
- Language: Python
- Homepage:
- Size: 45.9 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BGE-M3 Embedding Service
This is a high-performance, FastAPI-based microservice and WebSocket worker dedicated to generating text embeddings using the state-of-the-art **`BAAI/bge-m3`** model. Designed for international scalability, the architecture features a strictly validated configuration system, an intelligent exponential backoff WebSocket client for external integrations, and seamless CPU/GPU Docker deployments.
[](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml)
[](https://codecov.io/gh/artryazanov/embedding-service)
[](https://opensource.org/licenses/MIT)





## ๐ฅ Core Features
- **Pydantic Driven**: Centralized and type-safe `.env` parsing via `pydantic-settings`.
- **Dedicated Engine**: Refactored OOP `EmbeddingEngine` tailored specifically for extracting embeddings safely and closing memory leaks reliably.
- **Robust WebSocket Worker**: A resilient background task connecting to Reverb (`pusher_websocket`) possessing an exponential backoff retry mechanism to guarantee persistent connections under network stress.
- **FastAPI Core**: A high-performance REST API managed by advanced application `lifespan` generators.
- **Smart Hardware Detection**: Automatically targets `cuda` if available and safely falls back to `cpu`.
- **Modular Dockerfile**: A single Dockerfile handles both CPU and GPU builds natively via `ARG DEVICE`.
---
## ๐ Configuration (`.env`)
To start, copy the example configuration.
```bash
cp .env.example .env
```
| Variable | Description | Default |
| :--- | :--- | :--- |
| `API_TOKEN` | Optional Bearer token for secure REST endpoints. | `None` |
| `MODEL_NAME` | The HuggingFace model path or local repository name. | `BAAI/bge-m3` |
| `MAX_SEQ_LENGTH` | Maximum tokens per sequence. | `8192` |
| `DEVICE` | Target hardware. (`auto`, `cpu`, or `cuda`) | `auto` |
| `REVERB_APP_KEY` | Reverb integration key for the WebSocket worker. | `None` |
| `REVERB_HOST` | Host address of the Reverb instance. | `reverb` |
| `REVERB_PORT` | Port of the Reverb instance. | `8080` |
| `REVERB_SCHEME` | WebSocket connection layer (`http` maps to `ws`, `https` mappings to `wss`). | `http` |
---
## ๐ Running the Service (Docker)
### 1๏ธโฃ Run on GPU (Recommended)
This requires the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
**Build the image:**
```bash
# DEVICE=gpu is the default argument
docker build -t embedding-service:gpu .
```
**Launch the container:**
```bash
docker run -d -p 8000:8000 --gpus all \
--env-file .env \
-v $(pwd)/models:/app/models \
--name embedding-service embedding-service:gpu
```
### 2๏ธโฃ Run on CPU (Space & Compute Optimization)
If running on a standard server without GPU access, you can build a severely optimized environment relying on PyTorch's `cpu` wheels to drastically lower image weight.
**Build the optimized image:**
```bash
docker build --build-arg DEVICE=cpu -t embedding-service:cpu .
```
**Launch the container:**
```bash
docker run -d -p 8000:8000 \
--env-file .env \
-v $(pwd)/models:/app/models \
--name embedding-service-cpu embedding-service:cpu
```
---
## ๐ REST API Usage
### Health & Capabilities (`GET /health`)
Check service availability, loaded model identity, and active hardware device.
```bash
curl -X GET "http://localhost:8000/health" \
-H "Authorization: Bearer "
```
### Generate Single Embedding (`POST /vectorize`)
Extract a base embedding array for a single query or document.
```bash
curl -X POST "http://localhost:8000/vectorize" \
-H "Content-Type: application/json" \
-d '{"text": "Artificial Intelligence is evolving rapidly."}'
```
**Response:**
```json
{
"vector": [0.0123, -0.0456, 0.0789, ...]
}
```
### Generate Batch Embeddings (`POST /vectorize-batch`)
Compute multiple vectors highly optimally in a single pass. (Batch size explicitly chunked internally).
```bash
curl -X POST "http://localhost:8000/vectorize-batch" \
-H "Content-Type: application/json" \
-d '{"items": ["First document segment.", "Second document segment."]}'
```
**Response:**
```json
{
"vectors": [
[0.0123, ...],
[-0.0789, ...]
]
}
```
---
## ๐งช Development & Testing
This project adheres explicitly to **Senior Python Developer Guidelines** featuring `pytest`, mock patching, `pytest-cov`, and `pytest-asyncio` strictly executing in a sandboxed `venv`.
1. **Activate Environment and Install dependencies:**
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
```
2. **Run the complete testing suite (Target: 90%+ Coverage):**
```bash
pytest tests/ -v -p no:warnings --cov=.
```
## License
This project is licensed under the [MIT License](LICENSE).