{"id":28384446,"url":"https://github.com/runpod-workers/worker-infinity-embedding","last_synced_at":"2025-06-25T23:30:51.602Z","repository":{"id":224187257,"uuid":"748912344","full_name":"runpod-workers/worker-infinity-embedding","owner":"runpod-workers","description":"Create embeddings with infinity as serverless endpoint","archived":false,"fork":false,"pushed_at":"2025-05-20T19:19:15.000Z","size":68,"stargazers_count":28,"open_issues_count":4,"forks_count":17,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-30T09:27:29.655Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/runpod-workers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-27T02:44:17.000Z","updated_at":"2025-05-29T06:50:17.000Z","dependencies_parsed_at":"2024-05-30T18:20:10.408Z","dependency_job_id":"83b5943b-22a9-42f4-a315-e917bd136308","html_url":"https://github.com/runpod-workers/worker-infinity-embedding","commit_stats":null,"previous_names":["runpod-workers/worker-infinity-text-embeddings"],"tags_count":1,"template":false,"template_full_name":"runpod-workers/worker-template","purl":"pkg:github/runpod-workers/worker-infinity-embedding","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fworker-infinity-embedding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fworker-infinity-embedding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fworker-infinity-embedding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fworker-infinity-embedding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/runpod-workers","download_url":"https://codeload.github.com/runpod-workers/worker-infinity-embedding/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fworker-infinity-embedding/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261972491,"owners_count":23238536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-30T08:38:42.113Z","updated_at":"2025-06-25T23:30:51.593Z","avatar_url":"https://github.com/runpod-workers.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![Infinity Embedding Worker Banner](https://cpjrphpz3t5wbwfe.public.blob.vercel-storage.com/worker-infinity-embedding_banner-9n86vTARpwknMZYnXHAUr7xJisiWXs.jpeg)\n\n---\n\nHigh-throughput, OpenAI-compatible text embedding \u0026 reranker powered by [Infinity](https://github.com/michaelfeil/infinity)\n\n---\n\n[![RunPod](https://api.runpod.io/badge/runpod-workers/worker-infinity-embedding)](https://www.runpod.io/console/hub/runpod-workers/worker-infinity-embedding)\n\n---\n\n1. [Quickstart](#quickstart)\n2. [Endpoint Configuration](#endpoint-configuration)\n3. [API Specification](#api-specification)\n   1. [List Models](#list-models)\n   2. [Create Embeddings](#create-embeddings)\n   3. [Rerank Documents](#rerank-documents)\n4. [Usage](#usage)\n5. [Further Documentation](#further-documentation)\n6. [Acknowledgements](#acknowledgements)\n\n---\n\n## Quickstart\n\n1. 🐳 **Pull an image** – use the tag shown on the latest [GitHub release page](https://github.com/runpod-workers/worker-infinity-embedding/releases) (e.g. `runpod/worker-infinity-embedding:\u003cversion\u003e`)\n2. 🔧 **Configure** – set at least `MODEL_NAMES` (see [Endpoint Configuration](#endpoint-configuration))\n3. 🚀 **Deploy** – create a [RunPod Serverless endpoint](https://docs.runpod.io/serverless/endpoints/manage-endpoints)\n4. 🧪 **Call the API** – follow the example in the [Usage](#usage) section\n\n---\n\n## Endpoint Configuration\n\nAll behaviour is controlled through environment variables:\n\n| Variable                 | Required | Default | Description                                                                                                      |\n| ------------------------ | -------- | ------- | ---------------------------------------------------------------------------------------------------------------- |\n| `MODEL_NAMES`            | **Yes**  | —       | One or more Hugging-Face model IDs. Separate multiple IDs with a semicolon.\u003cbr\u003eExample: `BAAI/bge-small-en-v1.5` |\n| `BATCH_SIZES`            | No       | `32`    | Per-model batch size; semicolon-separated list matching `MODEL_NAMES`.                                           |\n| `BACKEND`                | No       | `torch` | Inference engine for _all_ models: `torch`, `optimum`, or `ctranslate2`.                                         |\n| `DTYPES`                 | No       | `auto`  | Precision per model (`auto`, `fp16`, `fp8`). Semicolon-separated, must match `MODEL_NAMES`.                      |\n| `INFINITY_QUEUE_SIZE`    | No       | `48000` | Max items queueable inside the Infinity engine.                                                                  |\n| `RUNPOD_MAX_CONCURRENCY` | No       | `300`   | Max concurrent requests the RunPod wrapper will accept.                                                          |\n\n---\n\n## API Specification\n\nTwo flavours, one schema.\n\n- **OpenAI-compatible** – drop-in replacement for `/v1/models`, `/v1/embeddings`, so you can use this endpoint instead of the API from OpenAI by replacing the base url with the URL of your endpoint: `https://api.runpod.ai/v2/\u003cENDPOINT_ID\u003e/openai/v1` and use your [API key from RunPod](https://docs.runpod.io/get-started/api-keys) instead of the one from OpenAI\n- **Standard RunPod** – call `/run` or `/runsync` with a JSON body under the `input` key.  \n  Base URL: `https://api.runpod.ai/v2/\u003cENDPOINT_ID\u003e`\n\nExcept for transport (path + wrapper object) the JSON you send/receive is identical. The tables below describe the shared payload.\n\n### List Models\n\n| Method | Path                | Body                                            |\n| ------ | ------------------- | ----------------------------------------------- |\n| `GET`  | `/openai/v1/models` | –                                               |\n| `POST` | `/runsync`          | `{ \"input\": { \"openai_route\": \"/v1/models\" } }` |\n\n#### Response\n\n```jsonc\n{\n  \"data\": [\n    { \"id\": \"BAAI/bge-small-en-v1.5\", \"stats\": {} },\n    { \"id\": \"intfloat/e5-large-v2\", \"stats\": {} }\n  ]\n}\n```\n\n---\n\n### Create Embeddings\n\n#### Request Fields (shared)\n\n| Field   | Type                | Required | Description                                       |\n| ------- | ------------------- | -------- | ------------------------------------------------- |\n| `model` | string              | **Yes**  | One of the IDs supplied via `MODEL_NAMES`.        |\n| `input` | string \u0026#124; array | **Yes**  | A single text string _or_ list of texts to embed. |\n\nOpenAI route vs. Standard:\n\n| Flavour  | Method | Path             | Body                                          |\n| -------- | ------ | ---------------- | --------------------------------------------- |\n| OpenAI   | `POST` | `/v1/embeddings` | `{ \"model\": \"…\", \"input\": \"…\" }`              |\n| Standard | `POST` | `/runsync`       | `{ \"input\": { \"model\": \"…\", \"input\": \"…\" } }` |\n\n#### Response (both flavours)\n\n```jsonc\n{\n  \"object\": \"list\",\n  \"model\": \"BAAI/bge-small-en-v1.5\",\n  \"data\": [\n    { \"object\": \"embedding\", \"embedding\": [0.01, -0.02 /* … */], \"index\": 0 }\n  ],\n  \"usage\": { \"prompt_tokens\": 2, \"total_tokens\": 2 }\n}\n```\n\n---\n\n### Rerank Documents (Standard only)\n\n| Field         | Type   | Required | Description                                                       |\n| ------------- | ------ | -------- | ----------------------------------------------------------------- |\n| `model`       | string | **Yes**  | Any deployed reranker model                                       |\n| `query`       | string | **Yes**  | The search/query text                                             |\n| `docs`        | array  | **Yes**  | List of documents to rerank                                       |\n| `return_docs` | bool   | No       | If `true`, return the documents in ranked order (default `false`) |\n\nCall pattern\n\n```http\nPOST /runsync\nContent-Type: application/json\n\n{\n  \"input\": {\n    \"model\": \"BAAI/bge-reranker-large\",\n    \"query\": \"Which product has warranty coverage?\",\n    \"docs\": [\n      \"Product A comes with a 2-year warranty\",\n      \"Product B is available in red and blue colors\",\n      \"All electronics include a standard 1-year warranty\"\n    ],\n    \"return_docs\": true\n  }\n}\n```\n\nResponse contains either `scores` or the full `docs` list, depending on `return_docs`.\n\n---\n\n## Usage\n\nBelow are minimal `curl` snippets so you can copy-paste from any machine.\n\n\u003e Replace `\u003cENDPOINT_ID\u003e` with your endpoint ID and `\u003cAPI_KEY\u003e` with a [RunPod API key](https://docs.runpod.io/get-started/api-keys).\n\n### OpenAI-Compatible Calls\n\n```bash\n# List models\ncurl -H \"Authorization: Bearer \u003cAPI_KEY\u003e\" \\\n     https://api.runpod.ai/v2/\u003cENDPOINT_ID\u003e/openai/v1/models\n\n# Create embeddings\ncurl -X POST \\\n  -H \"Authorization: Bearer \u003cAPI_KEY\u003e\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"BAAI/bge-small-en-v1.5\",\"input\":\"Hello world\"}' \\\n  https://api.runpod.ai/v2/\u003cENDPOINT_ID\u003e/openai/v1/embeddings\n```\n\n### Standard RunPod Calls\n\n```bash\n# Create embeddings (wait for result)\ncurl -X POST \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"input\":{\"model\":\"BAAI/bge-small-en-v1.5\",\"input\":\"Hello world\"}}' \\\n  https://api.runpod.ai/v2/\u003cENDPOINT_ID\u003e/runsync\n\n# Rerank\ncurl -X POST \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"input\":{\"model\":\"BAAI/bge-reranker-large\",\"query\":\"Which product has warranty coverage?\",\"docs\":[\"Product A comes with a 2-year warranty\",\"Product B is available in red and blue colors\",\"All electronics include a standard 1-year warranty\"],\"return_docs\":true}}' \\\n  https://api.runpod.ai/v2/\u003cENDPOINT_ID\u003e/runsync\n```\n\n---\n\n## Further Documentation\n\n- **[Infinity Engine](https://github.com/michaelfeil/infinity)** – how the ultra-fast backend works.\n- **[RunPod Docs](https://docs.runpod.io/)** – serverless concepts, limits, and API reference.\n\n---\n\n## Acknowledgements\n\nSpecial thanks to [Michael Feil](https://github.com/michaelfeil) for creating the Infinity engine and for his ongoing support of this project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunpod-workers%2Fworker-infinity-embedding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frunpod-workers%2Fworker-infinity-embedding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunpod-workers%2Fworker-infinity-embedding/lists"}