{"id":28902320,"url":"https://github.com/tensorchord/vechord","last_synced_at":"2025-07-26T00:33:49.179Z","repository":{"id":284292021,"uuid":"914759626","full_name":"tensorchord/vechord","owner":"tensorchord","description":"Turn PostgreSQL into your search engine in a Pythonic way.","archived":false,"fork":false,"pushed_at":"2025-07-22T09:50:21.000Z","size":595,"stargazers_count":46,"open_issues_count":3,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-22T11:41:27.807Z","etag":null,"topics":["maxsim","multivector","postgres","postgresql","rag","text-search","vector-database","vector-search"],"latest_commit_sha":null,"homepage":"https://tensorchord.github.io/vechord/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tensorchord.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-10T08:52:23.000Z","updated_at":"2025-07-22T09:50:25.000Z","dependencies_parsed_at":"2025-03-25T06:29:28.506Z","dependency_job_id":"fa31d117-3f90-4c2f-8e48-1b6f8355f3e4","html_url":"https://github.com/tensorchord/vechord","commit_stats":null,"previous_names":["tensorchord/vechord"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/tensorchord/vechord","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fvechord","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fvechord/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fvechord/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fvechord/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tensorchord","download_url":"https://codeload.github.com/tensorchord/vechord/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fvechord/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267093821,"owners_count":24034952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["maxsim","multivector","postgres","postgresql","rag","text-search","vector-database","vector-search"],"created_at":"2025-06-21T11:08:22.516Z","updated_at":"2025-07-26T00:33:49.165Z","avatar_url":"https://github.com/tensorchord.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/user-attachments/assets/7b2819bb-1a7d-4b84-9ff9-d0c4d5340da9\"\u003e\n\n\u003cp\u003e\n\n[![Python Check][ci-check-badge]][ci-check-file]\n[![Pages][ci-page-badge]][document-link]\n[![GitHub License][license-badge]][license-link]\n[![PyPI - Version][pypi-badge]][pypi-link]\n[![Discord][discord-badge]][discord-link]\n[![Blog][blog-badge]][blog-link]\n\n\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTurn PostgreSQL into your search engine in a Pythonic way.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n## Installation\n\n```sh\npip install vechord\n```\n\nThe related Docker images can be found in [VectorChord Suite][vectorchord-suite].\n\n- DockerHub: `tensorchord/vchord-suite:pg17-20250620`\n- GitHub Packages: `ghcr.io/tensorchord/vchord-suite:pg17-20250620`\n\n## Features\n\n- [x] vector search with [RaBitQ][rabitq] (powered by [VectorChord][vectorchord])\n- [x] multivec search with [WARP][xtr-warp] (powered by [VectorChord][vectorchord])\n- [x] keyword search with BM25 score (powered by [VectorChord-bm25][vectorchord-bm25])\n- [x] reduce boilerplate code by taking full advantage of the Python type hint\n- [x] provide decorator to inject the data from/to the database\n- [x] guarantee the data consistency with the PostgreSQL transaction\n- [x] auto-generate the web service\n- [x] provide common tools like (can also use any other libraries):\n  - [x] `Augmenter` for contextual retrieval\n  - [x] `Chunker` to segment the text into chunks\n  - [x] `Embedding` to generate the embedding from the text\n  - [x] `Evaluator` to evaluate the search results with `NDCG`, `MAP`, `Recall`, etc.\n  - [x] `Extractor` to extract the content from PDF, HTML, etc.\n  - [x] `EntityRecognizer` to extract the entities and relations from the text\n  - [x] `Reranker` for hybrid search\n\n## Examples\n\n- [simple.py](examples/simple.py): for people that are familiar with specialized vector database APIs\n- [beir.py](examples/beir.py): the most flexible way to use the library (loading, indexing, querying and evaluation)\n- [web.py](examples/web.py): build a web application with from the defined tables and pipeline\n- [essay.py](examples/essay.py): extract the content from Paul Graham's essays and evaluate the search results from LLM generated queries\n- [contextual.py](examples/contextual.py): contextual retrieval example with local PDF\n- [anthropic.py](examples/anthropic.py): contextual retrieval with the Anthropic's Tutorial example\n- [hybrid.py](examples/hybrid.py): hybrid search that rerank the results from vector search with keyword search\n- [graph.py](examples/graph.py): graph-like entity-relation retrieval\n\n## User Guide\n\nFor more details, check our [API reference][document-api] and [User Guide][document-guide].\n\n### Define the table\n\n```python\nfrom typing import Annotated, Optional\nfrom vechord.spec import Table, Vector, PrimaryKeyAutoIncrease, ForeignKey\n\n# use 3072 dimension vector\nDenseVector = Vector[3072]\n\nclass Document(Table, kw_only=True):\n    uid: Optional[PrimaryKeyAutoIncrease] = None  # auto-increase id, no need to set\n    link: str = \"\"\n    text: str\n\nclass Chunk(Table, kw_only=True)\n    uid: Optional[PrimaryKeyAutoIncrease] = None\n    doc_id: Annotated[int, ForeignKey[Document.uid]]  # reference to `Document.uid` on DELETE CASCADE\n    vector: DenseVector  # this comes with a default vector index\n    text: str\n```\n\n### Inject with decorator\n\n```python\nimport httpx\nfrom vechord.registry import VechordRegistry\nfrom vechord.extract import SimpleExtractor\nfrom vechord.embedding import GeminiDenseEmbedding\n\nvr = VechordRegistry(namespace=\"test\", url=\"postgresql://postgres:postgres@127.0.0.1:5432/\", tables=[Document, Chunk])\nextractor = SimpleExtractor()\nemb = GeminiDenseEmbedding()\n\n@vr.inject(output=Document)  # dump to the `Document` table\n# function parameters are free to define since `inject(input=...)` is not set\nasync def add_document(url: str) -\u003e Document:  # the return type is `Document`\n    async with httpx.AsyncClient() as client:\n        resp = await client.get(url)\n        text = extractor.extract_html(resp.text)\n        return Document(link=url, text=text)\n\n@vr.inject(input=Document, output=Chunk)  # load from the `Document` table and dump to the `Chunk` table\n# function parameters are the attributes of the `Document` table, only defined attributes\n# will be loaded from the `Document` table\nasync def add_chunk(uid: int, text: str) -\u003e list[Chunk]:  # the return type is `list[Chunk]`\n    chunks = text.split(\"\\n\")\n    return [Chunk(doc_id=uid, vector=await emb.vectorize_chunk(t), text=t) for t in chunks]\n\nasync def main():\n    async with vr, emb:  # handle the connection with context manager\n        await add_document(\"https://paulgraham.com/best.html\")  # add arguments as usual\n        await add_chunk()  # omit the arguments since the `input` is will be loaded from the `Document` table\n        await vr.insert(Document(text=\"hello world\"))  # insert manually\n        print(await vr.select_by(Document.partial_init()))  # select all the columns from table `Document`\n\nif __name__ == \"__main__\":\n    import asyncio\n    asyncio.run(main())\n```\n\n### Transaction\n\nTo guarantee the data consistency, users can use the `VechordRegistry.run` method to run multiple\nfunctions in a transaction.\n\nIn this transaction, all the functions will only load the data from the database that is inserted\nin the current transaction. So users can focus on the data processing part without worrying about\nwhich part of data has not been processed yet.\n\n```python\npipeline = vr.create_pipeline([add_document, add_chunk])\nawait pipeline.run(\"https://paulgraham.com/best.html\")  # only accept the arguments for the first function\n```\n\n### Search\n\n```python\nprint(await vr.search_by_vector(Chunk, await emb.vectorize_query(\"startup\")))\n```\n\n### Customized Index Configuration\n\n```python\nfrom vechord.spec import VectorIndex\n\nclass Chunk(Table, kw_only=True):\n    uid: Optional[PrimaryKeyAutoIncrease] = None\n    vector: Annotated[DenseVector, VectorIndex(distance=\"cos\", lists=128)]\n    text: str\n```\n\n### Access the underlying database cursor directly\n\n```python\nawait vr.client.get_cursor().execute(\"SET vchordrq.probes = 100;\")\n```\n\n### HTTP Service\n\nThis creates a WSGI application that can be served by any WSGI server.\n\nOpen the [OpenAPI Endpoint](http://127.0.0.1:8000/openapi/swagger) to check the API documentation.\n\n```python\nimport uvicorn\n\nuvicorn.run(create_web_app(vr))\n```\n\n## Development\n\n```bash\ndocker run --rm -d --name vdb -e POSTGRES_PASSWORD=postgres -p 5432:5432 ghcr.io/tensorchord/vchord-suite:pg17-20250620\nenvd up\n# inside the envd env, sync all the dependencies\nmake sync\n# format the code\nmake format\n```\n\n[vectorchord]: https://github.com/tensorchord/VectorChord/\n[vectorchord-bm25]: https://github.com/tensorchord/VectorChord-bm25\n[rabitq]: https://github.com/gaoj0017/RaBitQ\n[xtr-warp]:https://github.com/jlscheerer/xtr-warp\n[ci-check-badge]: https://github.com/tensorchord/vechord/actions/workflows/check.yml/badge.svg\n[ci-check-file]: https://github.com/tensorchord/vechord/actions/workflows/check.yml\n[ci-page-badge]: https://github.com/tensorchord/vechord/actions/workflows/pages.yml/badge.svg\n[document-link]: https://tensorchord.github.io/vechord/\n[document-api]: https://tensorchord.github.io/vechord/api.html\n[document-guide]: https://tensorchord.github.io/vechord/guide.html\n[license-badge]: https://img.shields.io/github/license/tensorchord/vechord\n[license-link]: https://github.com/tensorchord/vechord/blob/main/LICENSE\n[pypi-badge]: https://img.shields.io/pypi/v/vechord\n[pypi-link]: https://pypi.org/project/vechord/\n[discord-badge]: https://img.shields.io/discord/974584200327991326?\u0026logoColor=white\u0026color=5865F2\u0026style=flat\u0026logo=discord\u0026cacheSeconds=60\n[discord-link]: https://discord.gg/KqswhpVgdU\n[vectorchord-suite]: https://github.com/tensorchord/VectorChord-images\n[blog-badge]: https://img.shields.io/badge/VectorChrod-Blog-DAFDBA\n[blog-link]: https://blog.vectorchord.ai/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorchord%2Fvechord","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftensorchord%2Fvechord","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorchord%2Fvechord/lists"}