{"id":35584105,"url":"https://github.com/artryazanov/embedding-service","last_synced_at":"2026-03-15T19:43:13.414Z","repository":{"id":328390320,"uuid":"1114586056","full_name":"artryazanov/embedding-service","owner":"artryazanov","description":"This is a FastAPI-based service for generating text embeddings, supporting multiple architectures like intfloat/multilingual-e5-large and BAAI/bge-m3. It automatically configures prefixes and sequence lengths based on the selected model. It supports both single text and batch processing.","archived":false,"fork":false,"pushed_at":"2026-02-10T11:36:10.000Z","size":47,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-10T15:38:52.798Z","etag":null,"topics":["ai-assisted","bge-m3","docker","e5","embeddings","fastapi","fine-tuning","huggingface","machine-learning","multilingual","nlp","python","pytorch","rest-api","semantic-search","sentence-transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/artryazanov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-11T15:30:38.000Z","updated_at":"2026-02-10T11:36:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/artryazanov/embedding-service","commit_stats":null,"previous_names":["artryazanov/embedding-service"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/artryazanov/embedding-service","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artryazanov%2Fembedding-service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artryazanov%2Fembedding-service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artryazanov%2Fembedding-service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artryazanov%2Fembedding-service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/artryazanov","download_url":"https://codeload.github.com/artryazanov/embedding-service/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artryazanov%2Fembedding-service/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30550624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-15T15:03:43.933Z","status":"ssl_error","status_checked_at":"2026-03-15T15:03:37.630Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-assisted","bge-m3","docker","e5","embeddings","fastapi","fine-tuning","huggingface","machine-learning","multilingual","nlp","python","pytorch","rest-api","semantic-search","sentence-transformers"],"created_at":"2026-01-04T21:57:18.150Z","updated_at":"2026-03-15T19:43:13.409Z","avatar_url":"https://github.com/artryazanov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BGE-M3 Embedding Service\n\nThis is a high-performance, FastAPI-based microservice and WebSocket worker dedicated to generating text embeddings using the state-of-the-art **`BAAI/bge-m3`** model. Designed for international scalability, the architecture features a strictly validated configuration system, an intelligent exponential backoff WebSocket client for external integrations, and seamless CPU/GPU Docker deployments.\n\n[![Tests](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml/badge.svg)](https://github.com/artryazanov/embedding-service/actions/workflows/tests.yml)\n[![codecov](https://codecov.io/gh/artryazanov/embedding-service/graph/badge.svg)](https://codecov.io/gh/artryazanov/embedding-service)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![Python Versions](https://img.shields.io/badge/python-3.12-blue)\n![FastAPI](https://img.shields.io/badge/FastAPI-005571?style=flat\u0026logo=fastapi)\n![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=flat\u0026logo=docker\u0026logoColor=white)\n![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=flat\u0026logo=PyTorch\u0026logoColor=white)\n![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-yellow)\n\n## 🔥 Core Features\n- **Pydantic Driven**: Centralized and type-safe `.env` parsing via `pydantic-settings`.\n- **Dedicated Engine**: Refactored OOP `EmbeddingEngine` tailored specifically for extracting embeddings safely and closing memory leaks reliably.\n- **Robust WebSocket Worker**: A resilient background task connecting to Reverb (`pusher_websocket`) possessing an exponential backoff retry mechanism to guarantee persistent connections under network stress.\n- **FastAPI Core**: A high-performance REST API managed by advanced application `lifespan` generators.\n- **Smart Hardware Detection**: Automatically targets `cuda` if available and safely falls back to `cpu`. \n- **Modular Dockerfile**: A single Dockerfile handles both CPU and GPU builds natively via `ARG DEVICE`.\n\n---\n\n## 🛠 Configuration (`.env`)\n\nTo start, copy the example configuration.\n```bash\ncp .env.example .env\n```\n\n| Variable | Description | Default |\n| :--- | :--- | :--- |\n| `API_TOKEN` | Optional Bearer token for secure REST endpoints. | `None` |\n| `MODEL_NAME` | The HuggingFace model path or local repository name. | `BAAI/bge-m3` |\n| `MAX_SEQ_LENGTH` | Maximum tokens per sequence. | `8192` |\n| `DEVICE` | Target hardware. (`auto`, `cpu`, or `cuda`) | `auto` |\n| `REVERB_APP_KEY` | Reverb integration key for the WebSocket worker. | `None` |\n| `REVERB_HOST` | Host address of the Reverb instance. | `reverb` |\n| `REVERB_PORT` | Port of the Reverb instance. | `8080` |\n| `REVERB_SCHEME` | WebSocket connection layer (`http` maps to `ws`, `https` mappings to `wss`). | `http` |\n\n---\n\n## 🚀 Running the Service (Docker)\n\n### 1️⃣ Run on GPU (Recommended)\nThis requires the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).\n\n**Build the image:**\n```bash\n# DEVICE=gpu is the default argument\ndocker build -t embedding-service:gpu .\n```\n\n**Launch the container:**\n```bash\ndocker run -d -p 8000:8000 --gpus all \\\n  --env-file .env \\\n  -v $(pwd)/models:/app/models \\\n  --name embedding-service embedding-service:gpu\n```\n\n### 2️⃣ Run on CPU (Space \u0026 Compute Optimization)\nIf running on a standard server without GPU access, you can build a severely optimized environment relying on PyTorch's `cpu` wheels to drastically lower image weight.\n\n**Build the optimized image:**\n```bash\ndocker build --build-arg DEVICE=cpu -t embedding-service:cpu .\n```\n\n**Launch the container:**\n```bash\ndocker run -d -p 8000:8000 \\\n  --env-file .env \\\n  -v $(pwd)/models:/app/models \\\n  --name embedding-service-cpu embedding-service:cpu\n```\n\n---\n\n## 📚 REST API Usage\n\n### Health \u0026 Capabilities (`GET /health`)\nCheck service availability, loaded model identity, and active hardware device.\n```bash\ncurl -X GET \"http://localhost:8000/health\" \\\n     -H \"Authorization: Bearer \u003cAPI_TOKEN_IF_CONFIGURED\u003e\"\n```\n\n### Generate Single Embedding (`POST /vectorize`)\nExtract a base embedding array for a single query or document.\n```bash\ncurl -X POST \"http://localhost:8000/vectorize\" \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\"text\": \"Artificial Intelligence is evolving rapidly.\"}'\n```\n**Response:**\n```json\n{\n  \"vector\": [0.0123, -0.0456, 0.0789, ...]\n}\n```\n\n### Generate Batch Embeddings (`POST /vectorize-batch`)\nCompute multiple vectors highly optimally in a single pass. (Batch size explicitly chunked internally).\n```bash\ncurl -X POST \"http://localhost:8000/vectorize-batch\" \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\"items\": [\"First document segment.\", \"Second document segment.\"]}'\n```\n**Response:**\n```json\n{\n  \"vectors\": [\n    [0.0123, ...],\n    [-0.0789, ...]\n  ]\n}\n```\n\n---\n\n## 🧪 Development \u0026 Testing\n\nThis project adheres explicitly to **Senior Python Developer Guidelines** featuring `pytest`, mock patching, `pytest-cov`, and `pytest-asyncio` strictly executing in a sandboxed `venv`.\n\n1. **Activate Environment and Install dependencies:**\n```bash\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\npip install -r requirements-dev.txt\n```\n\n2. **Run the complete testing suite (Target: 90%+ Coverage):**\n```bash\npytest tests/ -v -p no:warnings --cov=.\n```\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartryazanov%2Fembedding-service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fartryazanov%2Fembedding-service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartryazanov%2Fembedding-service/lists"}