An open API service indexing awesome lists of open source software.

https://github.com/artryazanov/embedding-service

This is a FastAPI-based service for generating text embeddings, supporting multiple architectures like intfloat/multilingual-e5-large and BAAI/bge-m3. It automatically configures prefixes and sequence lengths based on the selected model. It supports both single text and batch processing.
https://github.com/artryazanov/embedding-service

ai-assisted bge-m3 docker e5 embeddings fastapi fine-tuning huggingface machine-learning multilingual nlp python pytorch rest-api semantic-search sentence-transformers

Last synced: 3 months ago
JSON representation

This is a FastAPI-based service for generating text embeddings, supporting multiple architectures like intfloat/multilingual-e5-large and BAAI/bge-m3. It automatically configures prefixes and sequence lengths based on the selected model. It supports both single text and batch processing.

Awesome Lists containing this project

README

          

# BGE-M3 Embedding Service

This is a high-performance, FastAPI-based microservice and WebSocket worker dedicated to generating text embeddings using the state-of-the-art **`BAAI/bge-m3`** model. Designed for international scalability, the architecture features a strictly validated configuration system, an intelligent exponential backoff WebSocket client for external integrations, and seamless CPU/GPU Docker deployments.

[![Tests](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml/badge.svg)](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/artryazanov/embedding-service/graph/badge.svg)](https://codecov.io/gh/artryazanov/embedding-service)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![Python Versions](https://img.shields.io/badge/python-3.12-blue)
![FastAPI](https://img.shields.io/badge/FastAPI-005571?style=flat&logo=fastapi)
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=flat&logo=docker&logoColor=white)
![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=flat&logo=PyTorch&logoColor=white)
![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-yellow)

## ๐Ÿ”ฅ Core Features
- **Pydantic Driven**: Centralized and type-safe `.env` parsing via `pydantic-settings`.
- **Dedicated Engine**: Refactored OOP `EmbeddingEngine` tailored specifically for extracting embeddings safely and closing memory leaks reliably.
- **Robust WebSocket Worker**: A resilient background task connecting to Reverb (`pusher_websocket`) possessing an exponential backoff retry mechanism to guarantee persistent connections under network stress.
- **FastAPI Core**: A high-performance REST API managed by advanced application `lifespan` generators.
- **Smart Hardware Detection**: Automatically targets `cuda` if available and safely falls back to `cpu`.
- **Modular Dockerfile**: A single Dockerfile handles both CPU and GPU builds natively via `ARG DEVICE`.

---

## ๐Ÿ›  Configuration (`.env`)

To start, copy the example configuration.
```bash
cp .env.example .env
```

| Variable | Description | Default |
| :--- | :--- | :--- |
| `API_TOKEN` | Optional Bearer token for secure REST endpoints. | `None` |
| `MODEL_NAME` | The HuggingFace model path or local repository name. | `BAAI/bge-m3` |
| `MAX_SEQ_LENGTH` | Maximum tokens per sequence. | `8192` |
| `DEVICE` | Target hardware. (`auto`, `cpu`, or `cuda`) | `auto` |
| `REVERB_APP_KEY` | Reverb integration key for the WebSocket worker. | `None` |
| `REVERB_HOST` | Host address of the Reverb instance. | `reverb` |
| `REVERB_PORT` | Port of the Reverb instance. | `8080` |
| `REVERB_SCHEME` | WebSocket connection layer (`http` maps to `ws`, `https` mappings to `wss`). | `http` |

---

## ๐Ÿš€ Running the Service (Docker)

### 1๏ธโƒฃ Run on GPU (Recommended)
This requires the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

**Build the image:**
```bash
# DEVICE=gpu is the default argument
docker build -t embedding-service:gpu .
```

**Launch the container:**
```bash
docker run -d -p 8000:8000 --gpus all \
--env-file .env \
-v $(pwd)/models:/app/models \
--name embedding-service embedding-service:gpu
```

### 2๏ธโƒฃ Run on CPU (Space & Compute Optimization)
If running on a standard server without GPU access, you can build a severely optimized environment relying on PyTorch's `cpu` wheels to drastically lower image weight.

**Build the optimized image:**
```bash
docker build --build-arg DEVICE=cpu -t embedding-service:cpu .
```

**Launch the container:**
```bash
docker run -d -p 8000:8000 \
--env-file .env \
-v $(pwd)/models:/app/models \
--name embedding-service-cpu embedding-service:cpu
```

---

## ๐Ÿ“š REST API Usage

### Health & Capabilities (`GET /health`)
Check service availability, loaded model identity, and active hardware device.
```bash
curl -X GET "http://localhost:8000/health" \
-H "Authorization: Bearer "
```

### Generate Single Embedding (`POST /vectorize`)
Extract a base embedding array for a single query or document.
```bash
curl -X POST "http://localhost:8000/vectorize" \
-H "Content-Type: application/json" \
-d '{"text": "Artificial Intelligence is evolving rapidly."}'
```
**Response:**
```json
{
"vector": [0.0123, -0.0456, 0.0789, ...]
}
```

### Generate Batch Embeddings (`POST /vectorize-batch`)
Compute multiple vectors highly optimally in a single pass. (Batch size explicitly chunked internally).
```bash
curl -X POST "http://localhost:8000/vectorize-batch" \
-H "Content-Type: application/json" \
-d '{"items": ["First document segment.", "Second document segment."]}'
```
**Response:**
```json
{
"vectors": [
[0.0123, ...],
[-0.0789, ...]
]
}
```

---

## ๐Ÿงช Development & Testing

This project adheres explicitly to **Senior Python Developer Guidelines** featuring `pytest`, mock patching, `pytest-cov`, and `pytest-asyncio` strictly executing in a sandboxed `venv`.

1. **Activate Environment and Install dependencies:**
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
```

2. **Run the complete testing suite (Target: 90%+ Coverage):**
```bash
pytest tests/ -v -p no:warnings --cov=.
```

## License

This project is licensed under the [MIT License](LICENSE).