{"id":17527023,"url":"https://github.com/lanterndata/lantern","last_synced_at":"2025-05-15T03:03:05.738Z","repository":{"id":180483802,"uuid":"664872761","full_name":"lanterndata/lantern","owner":"lanterndata","description":"PostgreSQL vector database extension for building AI applications","archived":false,"fork":false,"pushed_at":"2024-12-12T11:38:34.000Z","size":2042,"stargazers_count":842,"open_issues_count":40,"forks_count":61,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-11T14:19:41.523Z","etag":null,"topics":["ai","ann","approximate-nearest-neighbor-search","data-science","database","embeddings","hnsw","image-search","knn","machine-learning","mlops","nearest-neighbor-search","neural-search","open-source","postgres","postgresql","search","vector","ycombinator"],"latest_commit_sha":null,"homepage":"https://lantern.dev","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lanterndata.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-11T00:22:57.000Z","updated_at":"2025-04-11T09:56:48.000Z","dependencies_parsed_at":"2024-02-17T22:24:07.568Z","dependency_job_id":"3f2ecb52-8535-4d96-80d2-8f1b9b0721d4","html_url":"https://github.com/lanterndata/lantern","commit_stats":null,"previous_names":["lanterndata/lanterndb"],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanterndata%2Flantern","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanterndata%2Flantern/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanterndata%2Flantern/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanterndata%2Flantern/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lanterndata","download_url":"https://codeload.github.com/lanterndata/lantern/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254264744,"owners_count":22041792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ann","approximate-nearest-neighbor-search","data-science","database","embeddings","hnsw","image-search","knn","machine-learning","mlops","nearest-neighbor-search","neural-search","open-source","postgres","postgresql","search","vector","ycombinator"],"created_at":"2024-10-20T15:02:57.481Z","updated_at":"2025-05-15T03:03:05.688Z","avatar_url":"https://github.com/lanterndata.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# 💡 Lantern\n\n[![build](https://github.com/lanterndata/lantern/actions/workflows/build.yaml/badge.svg?branch=main)](https://github.com/lanterndata/lantern/actions/workflows/build.yaml)\n[![test](https://github.com/lanterndata/lantern/actions/workflows/test.yaml/badge.svg?branch=main)](https://github.com/lanterndata/lantern/actions/workflows/test.yaml)\n[![codecov](https://codecov.io/github/lanterndata/lantern/branch/main/graph/badge.svg)](https://codecov.io/github/lanterndata/lantern)\n[![Run on Replit](https://img.shields.io/badge/Run%20on-Replit-blue?logo=replit)](https://replit.com/@lanterndata/lantern-playground#.replit)\n\nLantern is an open-source PostgreSQL database extension to store vector data, generate embeddings, and handle vector search operations.\n\nIt provides a new index type for vector columns called `lantern_hnsw` which speeds up `ORDER BY ... LIMIT` queries.\n\nLantern builds and uses [usearch](https://github.com/unum-cloud/usearch), a single-header state-of-the-art HNSW implementation.\n\n## 🔧 Quick Install\n\nIf you don’t have PostgreSQL already, use Lantern with [Docker](https://hub.docker.com/r/lanterndata/lantern) to get started quickly:\n\n```bash\ndocker run --pull=always --rm -p 5432:5432 -e \"POSTGRES_USER=$USER\" -e \"POSTGRES_PASSWORD=postgres\" -v ./lantern_data:/var/lib/postgresql/data lanterndata/lantern:latest-pg15\n```\nThen, you can connect to the database via `postgresql://$USER:postgres@localhost/postgres`.\n\nTo install Lantern using `homebrew`:\n\n```\nbrew tap lanterndata/lantern\nbrew install lantern \u0026\u0026 lantern_install\n```\n\nYou can also install Lantern on top of PostgreSQL from our [precompiled binaries](https://github.com/lanterndata/lantern/releases) via a single `make install`.\n\nAlternatively, you can use Lantern in one click using [Replit](https://replit.com/@lanterndata/lantern-playground#.replit).\n\n\n## 🔧 Build Lantern from source code on top of your existing PostgreSQL\nPrerequisites:\n```\ncmake version: \u003e=3.3\ngcc \u0026\u0026 g++ version: \u003e=11 when building portable binaries, \u003e= 12 when building on new hardware or with CPU-specific vectorization\nPostgreSQL 11, 12, 13, 14, 15 or 16\nCorresponding development package for PostgreSQL (postgresql-server-dev-$version)\n\n```\nTo build Lantern on new hardware or with CPU-specific vectorization:\n```\ngit clone --recursive https://github.com/lanterndata/lantern.git\ncd lantern\ncmake -DMARCH_NATIVE=ON -S lantern_hnsw -B build\nmake -C build install -j\n```\n\nTo build portable Lantern binaries:\n``` \ngit clone --recursive https://github.com/lanterndata/lantern.git\ncd lantern\ncmake -DMARCH_NATIVE=OFF -S lantern_hnsw -B build\nmake -C build install -j\n```\n\n## 📖 How to use Lantern\n\nLantern retains the standard PostgreSQL interface, so it is compatible with all of your favorite tools in the PostgreSQL ecosystem.\n\nFirst, enable Lantern in SQL (e.g. via `psql` shell)\n\n```sql\nCREATE EXTENSION lantern;\n```\n\nNote: After running the above, lantern extension is only available on the current postgres DATABASE (single postgres instance may have multiple such DATABASES).\nWhen connecting to a different DATABASE, make sure to run the above command for the new one as well. For example:\n\n```sql\nCREATE DATABASE newdb;\n\\c newdb\nCREATE EXTENSION lantern;\n```\n\nCreate a table with a vector column and add your data\n\n```sql\nCREATE TABLE small_world (id integer, vector real[3]);\nINSERT INTO small_world (id, vector) VALUES (0, '{0,0,0}'), (1, '{0,0,1}');\n```\n\nCreate an hnsw index on the table via `lantern_hnsw`:\n\n```sql\nCREATE INDEX ON small_world USING lantern_hnsw (vector);\n```\n\nCustomize `lantern_hnsw` index parameters depending on your vector data, such as the distance function (e.g., `dist_l2sq_ops`), index construction parameters, and index search parameters.\n\n```sql\nCREATE INDEX ON small_world USING lantern_hnsw (vector dist_l2sq_ops)\nWITH (M=2, ef_construction=10, ef=4, dim=3);\n```\n\nStart querying data\n\n```sql\nSET enable_seqscan = false;\nSELECT id, l2sq_dist(vector, ARRAY[0,0,0]) AS dist\nFROM small_world ORDER BY vector \u003c-\u003e ARRAY[0,0,0] LIMIT 1;\n```\n\n### A note on operators and operator classes\n\nLantern supports several distance functions in the index\n\nThere are 3 operators available `\u003c-\u003e` (l2sq), `\u003c=\u003e` (cosine), `\u003c+\u003e` (hamming).\n\nThere are four defined operator classes that can be employed during index creation:\n\n- **`dist_l2sq_ops`**: Default for the type `real[]`\n- **`dist_vec_l2sq_ops`**: Default for the type `vector`\n- **`dist_cos_ops`**: Applicable to the type `real[]`\n- **`dist_vec_cos_ops`**: Applicable to the type `vector`\n- **`dist_hamming_ops`**: Applicable to the type `integer[]`\n\n### Index Construction Parameters\n\nThe `M`, `ef`, and `ef_construction` parameters control the performance of the HNSW algorithm for your use case.\n\n- In general, lower `M` and `ef_construction` speed up index creation at the cost of recall.\n- Lower `M` and `ef` improve search speed and result in fewer shared buffer hits at the cost of recall. Tuning these parameters will require experimentation for your specific use case.\n\n### Miscellaneous\n\n- If you have previously cloned Lantern and would like to update run `git pull \u0026\u0026 git submodule update --recursive`\n\n## ⭐️ Features\n\n- Embedding generation for popular use cases (CLIP model, Hugging Face models, custom model)\n- Interoperability with pgvector's data type, so anyone using pgvector can switch to Lantern\n- Parallel index creation via an external indexer\n- Ability to generate the index graph outside of the database server\n- Support for creating the index outside of the database and inside another instance allows you to create an index without interrupting database workflows.\n- See all of our helper functions to better enable your workflows\n\n## 🏎️ Performance\n\nImportant takeaways:\n\n- There's three key metrics we track. `CREATE INDEX` time, `SELECT` throughput, and `SELECT` latency.\n- We match or outperform pgvector and pg_embedding (Neon) on all of these metrics.\n- We plan to continue to make performance improvements to ensure we are the best performing database.\n\n\u003cp\u003e\n\u003cimg alt=\"Lantern throughput\" src=\"https://storage.googleapis.com/lantern-blog/1/throughput.png\" width=\"400\" style=\"float: left;\" /\u003e\n\u003cimg alt=\"Lantern latency\" src=\"https://storage.googleapis.com/lantern-blog/1/latency.png\" width=\"400\" style=\"float: left;\" /\u003e\n\u003cimg alt=\"Lantern index creation\" src=\"https://storage.googleapis.com/lantern-blog/1/create.png\" width=\"400\" style=\"float: left;\" /\u003e\n\u003c/p\u003e\n\n## 🗺️ Roadmap\n\n- Cloud-hosted version of Lantern - Sign up [here](https://lantern.dev)\n- Hardware-accelerated distance metrics, tailored for your CPU, enabling faster queries\n- Templates and guides for building applications for different industries\n- More tools for generating embeddings (support for third party model API’s, more local models)\n- Support for version control and A/B test embeddings\n- Autotuned index type that will choose appropriate creation parameters\n- Support for 1 byte and 2 byte vector elements, and up to 8000 dimensional vectors ([PR #19](https://github.com/lanterndata/lantern/pull/19))\n- Request a feature at [support@lantern.dev](mailto:support@lantern.dev)\n\n## 📚 Resources\n\n- [GitHub issues](https://github.com/lanterndata/lantern/issues): report bugs or issues with Lantern\n- Need support? Contact [support@lantern.dev](mailto:support@lantern.dev). We are happy to troubleshoot issues and advise on how to use Lantern for your use case\n- We welcome community contributions! Feel free to open an issue or a PR. If you contact [support@lantern.dev](mailto:support@lantern.dev), we can find an open issue or project that fits you\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flanterndata%2Flantern","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flanterndata%2Flantern","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flanterndata%2Flantern/lists"}