{"id":24362168,"url":"https://github.com/elkornacio/pg_auto_embeddings","last_synced_at":"2025-08-01T03:08:12.919Z","repository":{"id":271626008,"uuid":"913884564","full_name":"ElKornacio/pg_auto_embeddings","owner":"ElKornacio","description":"Text embeddings calculation for Postgres, without extensions. Simple, atomic, supports OpenAI/Anthropic models. Does not require any additional extensions, making it suitable for managed databases and other restricted environments.","archived":false,"fork":false,"pushed_at":"2025-01-16T18:49:10.000Z","size":2128,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-10T10:59:12.061Z","etag":null,"topics":["embeddings","openai","postgres","postgresql","vector"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ElKornacio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-08T14:39:10.000Z","updated_at":"2025-01-16T18:49:12.000Z","dependencies_parsed_at":"2025-04-10T10:43:41.443Z","dependency_job_id":"25d92f42-059d-4065-822e-6e15ddd7f57a","html_url":"https://github.com/ElKornacio/pg_auto_embeddings","commit_stats":null,"previous_names":["elkornacio/pg_auto_embeddings"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ElKornacio/pg_auto_embeddings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElKornacio%2Fpg_auto_embeddings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElKornacio%2Fpg_auto_embeddings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElKornacio%2Fpg_auto_embeddings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElKornacio%2Fpg_auto_embeddings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ElKornacio","download_url":"https://codeload.github.com/ElKornacio/pg_auto_embeddings/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElKornacio%2Fpg_auto_embeddings/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268162399,"owners_count":24205702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","openai","postgres","postgresql","vector"],"created_at":"2025-01-18T22:50:02.124Z","updated_at":"2025-08-01T03:08:12.876Z","avatar_url":"https://github.com/ElKornacio.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# 🤖 pg_auto_embeddings\n\n[![PostgreSQL](https://img.shields.io/badge/PostgreSQL-blue?style=for-the-badge\u0026logo=postgresql\u0026logoColor=white)](https://www.postgresql.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green?style=for-the-badge\u0026logo=opensourceinitiative\u0026logoColor=white)](https://opensource.org/licenses/MIT)\n[![Twitter Follow](https://img.shields.io/twitter/follow/elkornacio?style=for-the-badge\u0026logo=x\u0026logoColor=white)](https://x.com/elkornacio)\n[![GitHub Issues](https://img.shields.io/github/issues/elkornacio/pg_auto_embeddings?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/elkornacio/pg_auto_embeddings/issues)\n[![GitHub Stars](https://img.shields.io/github/stars/elkornacio/pg_auto_embeddings?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/elkornacio/pg_auto_embeddings/stargazers)\n\n`select embedding('text')` for Postgres. Without extensions. Simple, atomic, transaction-safe. Supports OpenAI/Anthropic embeddings out-of-the-box. Installation via SQL file. Perfect fit for RAG systems.\n\n\u003cimg src=\"assets/screenshot.jpg\" alt=\"pg_auto_embeddings screenshot\" width=\"600\"/\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n## 📋 Table of Contents\n\n[Overview](#overview) •\n[Quick Start](#quick-start) •\n[On-premise](#on-premise) •\n[Key Features](#key-features) •\n[Contribution](#contribution-and-development) •\n[FAQ](#faq)\n\n\u003c/div\u003e\n\n**pg_auto_embeddings** is an open-source project designed to provide a simple, atomic, transaction-safe function for calculating text embeddings in PostgreSQL. Its main feature is that it does not require any additional extensions, making it suitable for managed databases and other restricted environments. It uses Foreign Data Wrappers (FDW) as way to call embedding APIs though a public or on-premise proxy server.\n\n## Quick Start\n\n1. Run `simple/pgae_simple_install.sql` ([here it is](simple/pgae_simple_install.sql)) on your database.\n2. Call `pgae_init('openai-text-embedding-...', '\u003capi-key\u003e')` to initialize. Provide the model and API key.\n\n### Usage\n\n- Use `SELECT pgae_embedding('some text')` to get vector embedding as `double precision[]`.\n- Use `SELECT pgae_embedding_vec('some text')` to get vector embedding as `vector`.\n\n## Supported Models\n\n#### OpenAI:\n\n- `openai-text-embedding-3-small`\n- `openai-text-embedding-3-large`\n- `openai-text-embedding-ada-002`\n\n#### Voyage (Anthropic):\n\n- `voyage-3-large`\n- `voyage-3`\n- `voyage-3-lite`\n- `voyage-code-3`\n- `voyage-finance-2`\n- `voyage-law-2`\n- `voyage-code-2`\n\n## On-premise\n\n1. Run `docker run -d -p 5432:5432 elkornacio/pg_auto_embeddings` to start your own proxy server. You can apply whitelists, rate limits, etc via environment variables (read below).\n2. Call `pgae_init_onprem('your.host.com', '5432', 'openai-text-embedding-3-small', 'sk-...')` to initialize. Provide the model you use for embeddings and your API key.\n3. Use `SELECT pgae_embedding('some text')` to get vector embedding.\n\n## Key Features\n\n- **No Extensions Required**: Works seamlessly without needing to install PostgreSQL extensions.\n- **Two Installation Options**:\n  1. **Simple Installation**: Just execute a SQL script for quick setup.\n  2. **On-Premise Installation**: Use an on-premise Docker server as a proxy for embedding API calls.\n- **Supports Public and Local APIs**: Can be configured to work with both public APIs and local solutions.\n\n### Usage\n\n- Initialize for a remote server with open APIs:\n\n  ```sql\n  CALL pgae_init('openai-text-embedding-model', 'MY_OPENAI_API_KEY');\n\n  ```\n\n- Calculate embedding for a single text:\n\n  ```sql\n  SELECT pgae_embedding('some text'); -- returns double precision[]\n  ```\n\n- Calculate embedding for a single text with `pgvector` extension:\n\n  ```sql\n  SELECT pgae_embedding_vec('some text'); -- returns vector\n  ```\n\n- Create an auto-embedding calculation for the column `title` in the table `posts` (`public` schema):\n\n  ```sql\n  SELECT pgae_create_auto_embedding('public', 'posts', 'title', 'title_embedding');\n  ```\n\n  Now, on INSERT or UPDATE of `title`, embedding in the `title_embedding` column will be updated automatically.\n\n- Completely remove `pg_auto_embeddings` from your database:\n  ```sql\n  SELECT pgae_self_destroy();\n  ```\n\n### On-Premise Enviroment Variables\n\n- `PG_HOST` - Host of your database, default `localhost`. That's the host which Node.js server is connected on start. Don't change it unless you know what you're doing.\n- `PG_PORT` - Port of your database, default `5432`. That's the port which Node.js server is connected on start. Don't change it unless you know what you're doing.\n- `PG_USERNAME` - Username of your database. You must provide you own value to ensure security.\n- `PG_PASSWORD` - Password of your database. You must provide you own value to ensure security.\n- `SERVER_HOST` - Host Node.js proxy server is listening on, default `localhost`.\n- `SERVER_PORT` - Port Node.js proxy server is listening on, default `3000`.\n- `SELF_URL` - URL of the Node.js server, default `http://localhost:3000`. All internal requests from Postgres proxy to Node.js are made on this URL.\n- `CONTROL_SERVER_HOST` - Host Control Server is listening on. When this variable is not provided, Control server does not start.\n- `CONTROL_SERVER_PORT` - Port Control server is listening on. When this variable is not provided, Control server does not start.\n\n## Contribution and Development\n\n- Issues and Pull Requests are welcome for feedback, improvements, and new ideas.\n- The project is open to suggestions for expanding functionality and enhancing stability.\n\n## FAQ\n\n- How to install `pg_auto_embeddings` on a new database?\n\n  Execute SQL code from `simple/pgae_simple_install.sql` ([here it is](simple/pgae_simple_install.sql)) on your database.\n  Then call `pgae_init('\u003cmodel\u003e', '\u003capi_key\u003e')` to initialize. Provide the model you use for embeddings and your API key.\n\n- Do you support OpenAI embeddings?\n\n  Yes, their models are supported out of the box.\n\n- Do you support Voyage (Anthropic) embeddings?\n\n  Yes, their models are supported out of the box.\n\n- How to remove `pg_auto_embeddings` from my database?\n\n  ```sql\n  SELECT pgae_self_destroy();\n  ```\n\n- How does it work?\n\n  We use Foreign Data Wrappers (FDW) to call functions on the remote PostgreSQL server. The chain is:\n\n  1. We create FDW table local `embeddings_*` -\u003e remote `embeddings_*`\n  2. You call local function `pgae_embedding('some text')`\n  3. It executes `UPDATE embeddings SET ...` on a local table\n  4. FDW causes synchronous execution of `UPDATE embeddings SET ...` on the remote table\n  5. It causes synchronous execution of the trigger like `NEW.embedding = pgae_embedding_internal(NEW.text_val); RETURN NEW.embedding;`\n  6. It executes `pgae_embedding_internal` function which calls HTTP API of local Node.js server\n  7. Node.js server calls OpenAI API and returns embedding as a result\n  8. Remote PostgreSQL propagates the result to the remote `embeddings_*` table\n  9. Local PostgreSQL receives update on local `embeddings_*` table\n  10. This update is propagated as result of the execution of `pgae_embedding('some text')`\n\n  The code of both server SQL and local SQL in this repo. So, you can look for additional details in the code and ask any questions in Issues.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE). You are free to use the code in your projects while retaining the original author notifications.\n\nEnjoy using **pg_auto_embeddings**! We hope it simplifies the integration of text embeddings in PostgreSQL without the need for complex extensions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felkornacio%2Fpg_auto_embeddings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felkornacio%2Fpg_auto_embeddings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felkornacio%2Fpg_auto_embeddings/lists"}