{"id":18408354,"url":"https://github.com/pingcap/tidb-vector-python","last_synced_at":"2025-05-09T02:46:47.393Z","repository":{"id":216825496,"uuid":"742278253","full_name":"pingcap/tidb-vector-python","owner":"pingcap","description":"TiDB Vector SDK for Python, including code examples. Join our Discord: https://discord.gg/XzSW23Jg9p","archived":false,"fork":false,"pushed_at":"2024-11-15T11:21:09.000Z","size":599,"stargazers_count":56,"open_issues_count":4,"forks_count":16,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-05T19:59:34.389Z","etag":null,"topics":["python","tidb-vector","tidbcloud"],"latest_commit_sha":null,"homepage":"https://tidb.cloud/ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pingcap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-01-12T05:54:21.000Z","updated_at":"2025-04-15T03:24:37.000Z","dependencies_parsed_at":"2024-01-13T04:04:37.189Z","dependency_job_id":"80244cce-c1c9-4140-8e85-a267b7b9eef2","html_url":"https://github.com/pingcap/tidb-vector-python","commit_stats":null,"previous_names":["ianthereal/tidb_vector_python","pingcap/tidb-vector-python"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Ftidb-vector-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Ftidb-vector-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Ftidb-vector-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Ftidb-vector-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pingcap","download_url":"https://codeload.github.com/pingcap/tidb-vector-python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253181392,"owners_count":21866989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","tidb-vector","tidbcloud"],"created_at":"2024-11-06T03:18:40.720Z","updated_at":"2025-05-09T02:46:47.371Z","avatar_url":"https://github.com/pingcap.png","language":"Python","readme":"# tidb-vector-python\n\nUse TiDB Vector Search with Python.\n\n## Usage\n\nTiDB is a SQL database so that this package introduces Vector Search capability for Python ORMs:\n\n- [#SQLAlchemy](#sqlalchemy)\n- [#Peewee](#peewee)\n- [#Django](#django)\n\nPick one that you are familiar with to get started. If you are not using any of them, we recommend [#SQLAlchemy](#sqlalchemy).\n\nWe also provide a Vector Search client for simple usage:\n\n- [#TiDB Vector Client](#tidb-vector-client)\n\n### SQLAlchemy\n\nInstall:\n\n```bash\npip install tidb-vector sqlalchemy pymysql\n```\n\nUsage:\n\n```python\nfrom sqlalchemy import Integer, Column\nfrom sqlalchemy import create_engine, select\nfrom sqlalchemy.dialects.mysql import LONGTEXT\nfrom sqlalchemy.orm import Session, declarative_base\n\nimport tidb_vector\nfrom tidb_vector.sqlalchemy import VectorType, VectorAdaptor\n\nengine = create_engine(\"mysql+pymysql://root@127.0.0.1:4000/test\")\nBase = declarative_base()\n\n\n# Define table schema\nclass Doc(Base):\n    __tablename__ = \"doc\"\n    id = Column(Integer, primary_key=True)\n    embedding = Column(VectorType(dim=3))\n    content = Column(LONGTEXT)\n\n\n# Create empty table\nBase.metadata.drop_all(engine)  # clean data from last run\nBase.metadata.create_all(engine)\n\n# Create index for L2 distance\nVectorAdaptor(engine).create_vector_index(\n    Doc.embedding, tidb_vector.DistanceMetric.L2, skip_existing=True\n    # For cosine distance, use tidb_vector.DistanceMetric.COSINE\n)\n\n# Insert content with vectors\nwith Session(engine) as session:\n    session.add(Doc(id=1, content=\"dog\", embedding=[1, 2, 1]))\n    session.add(Doc(id=2, content=\"fish\", embedding=[1, 2, 4]))\n    session.add(Doc(id=3, content=\"tree\", embedding=[1, 0, 0]))\n    session.commit()\n\n# Perform Vector Search for Top K=1\nwith Session(engine) as session:\n    results = session.execute(\n        select(Doc.id, Doc.content)\n        .order_by(Doc.embedding.l2_distance([1, 2, 3]))\n        # For cosine distance, use Doc.embedding.cosine_distance(...)\n        .limit(1)\n    ).all()\n    print(results)\n\n# Perform filtered Vector Search by adding a Where Clause:\nwith Session(engine) as session:\n    results = session.execute(\n        select(Doc.id, Doc.content)\n        .where(Doc.content == \"dog\")\n        .order_by(Doc.embedding.l2_distance([1, 2, 3]))\n        .limit(1)\n    ).all()\n    print(results)\n```\n\n### Peewee\n\nInstall:\n\n```bash\npip install tidb-vector peewee pymysql\n```\n\nUsage:\n\n```python\nimport tidb_vector\nfrom peewee import Model, MySQLDatabase, IntegerField, TextField\nfrom tidb_vector.peewee import VectorField, VectorAdaptor\n\ndb = MySQLDatabase(\n    database=\"test\",\n    user=\"root\",\n    password=\"\",\n    host=\"127.0.0.1\",\n    port=4000,\n)\n\n\n# Define table schema\nclass Doc(Model):\n    class Meta:\n        database = db\n        table_name = \"peewee_test\"\n\n    id = IntegerField(primary_key=True)\n    embedding = VectorField(3)\n    content = TextField()\n\n\n# Create empty table and index for L2 distance\ndb.drop_tables([Doc])  # clean data from last run\ndb.create_tables([Doc])\n# For cosine distance, use tidb_vector.DistanceMetric.COSINE\nVectorAdaptor(db).create_vector_index(Doc.embedding, tidb_vector.DistanceMetric.L2)\n\n# Insert content with vectors\nDoc.insert_many(\n    [\n        {\"id\": 1, \"content\": \"dog\", \"embedding\": [1, 2, 1]},\n        {\"id\": 2, \"content\": \"fish\", \"embedding\": [1, 2, 4]},\n        {\"id\": 3, \"content\": \"tree\", \"embedding\": [1, 0, 0]},\n    ]\n).execute()\n\n# Perform Vector Search for Top K=1\ncursor = (\n    Doc.select(Doc.id, Doc.content)\n    # For cosine distance, use Doc.embedding.cosine_distance(...)\n    .order_by(Doc.embedding.l2_distance([1, 2, 3]))\n    .limit(1)\n)\nfor row in cursor:\n    print(row.id, row.content)\n\n\n# Perform filtered Vector Search by adding a Where Clause:\ncursor = (\n    Doc.select(Doc.id, Doc.content)\n    .where(Doc.content == \"dog\")\n    .order_by(Doc.embedding.l2_distance([1, 2, 3]))\n    .limit(1)\n)\nfor row in cursor:\n    print(row.id, row.content)\n```\n\n### Django\n\n\u003e [!TIP]\n\u003e\n\u003e Django is a full-featured web framework, not just an ORM. The following usage introducutions are provided for existing Django users.\n\u003e\n\u003e For new users to get started, consider using SQLAlchemy or Peewee.\n\nInstall:\n\n```bash\npip install 'django-tidb[vector]~=5.0.0' 'django~=5.0.0'  mysqlclient\n```\n\nUsage:\n\n1\\. Configure `django_tidb` as engine, like:\n\n```python\nDATABASES = {\n    'default': {\n        'ENGINE': 'django_tidb',\n        'NAME': 'django',\n        'USER': 'root',\n        'PASSWORD': '',\n        'HOST': '127.0.0.1',\n        'PORT': 4000,\n    },\n}\n```\n\n2\\. Define a model with a vector field and vector index:\n\n```python\nfrom django.db import models\nfrom django_tidb.fields.vector import VectorField, VectorIndex, L2Distance\n\nclass Doc(models.Model):\n    id = models.IntegerField(primary_key=True)\n    embedding = VectorField(dimensions=3)\n    content = models.TextField()\n    class Meta:\n        indexes = [VectorIndex(L2Distance(\"embedding\"), name=\"idx\")]\n```\n\n3\\. Insert data:\n\n```python\nDoc.objects.create(id=1, content=\"dog\", embedding=[1, 2, 1])\nDoc.objects.create(id=2, content=\"fish\", embedding=[1, 2, 4])\nDoc.objects.create(id=3, content=\"tree\", embedding=[1, 0, 0])\n```\n\n4\\. Perform Vector Search for Top K=1:\n\n```python\nqueryset = (\n    Doc.objects\n        .order_by(L2Distance(\"embedding\", [1, 2, 3]))\n        .values(\"id\", \"content\")[:1]\n)\nprint(queryset)\n```\n\n5\\. Perform filtered Vector Search by adding a Where Clause:\n\n```python\nqueryset = (\n     Doc.objects\n          .filter(content=\"dog\")\n          .order_by(L2Distance(\"embedding\", [1, 2, 3]))\n          .values(\"id\", \"content\")[:1]\n)\nprint(queryset)\n```\n\nFor more details, see [django-tidb](https://github.com/pingcap/django-tidb?tab=readme-ov-file#vector-beta).\n\n### TiDB Vector Client\n\nWithin the framework, you can directly utilize the built-in `TiDBVectorClient`, as demonstrated by integrations like [Langchain](https://python.langchain.com/docs/integrations/vectorstores/tidb_vector) and [Llama index](https://docs.llamaindex.ai/en/stable/community/integrations/vector_stores.html#using-a-vector-store-as-an-index), to seamlessly interact with TiDB Vector. This approach abstracts away the need to manage the underlying ORM, simplifying your interaction with the vector store.\n\nWe provide `TiDBVectorClient` which is based on sqlalchemy, you need to use `pip install tidb-vector[client]` to install it.\n\nCreate a `TiDBVectorClient` instance:\n\n```python\nfrom tidb_vector.integrations import TiDBVectorClient\n\nTABLE_NAME = 'vector_test'\nCONNECTION_STRING = 'mysql+pymysql://\u003cUSER\u003e:\u003cPASSWORD\u003e@\u003cHOST\u003e:4000/\u003cDB\u003e?ssl_verify_cert=true\u0026ssl_verify_identity=true'\n\ntidb_vs = TiDBVectorClient(\n    # the table which will store the vector data\n    table_name=TABLE_NAME,\n    # tidb connection string\n    connection_string=CONNECTION_STRING,\n    # the dimension of the vector, in this example, we use the ada model, which has 1536 dimensions\n    vector_dimension=1536,\n    # if recreate the table if it already exists\n    drop_existing_table=True,\n)\n```\n\nBulk insert:\n\n```python\nids = [\n    \"f8e7dee2-63b6-42f1-8b60-2d46710c1971\",\n    \"8dde1fbc-2522-4ca2-aedf-5dcb2966d1c6\",\n    \"e4991349-d00b-485c-a481-f61695f2b5ae\",\n]\ndocuments = [\"foo\", \"bar\", \"baz\"]\nembeddings = [\n    text_to_embedding(\"foo\"),\n    text_to_embedding(\"bar\"),\n    text_to_embedding(\"baz\"),\n]\nmetadatas = [\n    {\"page\": 1, \"category\": \"P1\"},\n    {\"page\": 2, \"category\": \"P1\"},\n    {\"page\": 3, \"category\": \"P2\"},\n]\n\ntidb_vs.insert(\n    ids=ids,\n    texts=documents,\n    embeddings=embeddings,\n    metadatas=metadatas,\n)\n```\n\nQuery:\n\n```python\ntidb_vs.query(text_to_embedding(\"foo\"), k=3)\n\n# query with filter\ntidb_vs.query(text_to_embedding(\"foo\"), k=3, filter={\"category\": \"P1\"})\n```\n\nBulk delete:\n\n```python\ntidb_vs.delete([\"f8e7dee2-63b6-42f1-8b60-2d46710c1971\"])\n\n# delete with filter\ntidb_vs.delete([\"f8e7dee2-63b6-42f1-8b60-2d46710c1971\"], filter={\"category\": \"P1\"})\n```\n\n## Examples\n\nThere are some examples to show how to use the tidb-vector-python to interact with TiDB Vector in different scenarios.\n\n- [OpenAI Embedding](./examples/openai_embedding/README.md): use the OpenAI embedding model to generate vectors for text data, store them in TiDB Vector, and search for similar text.\n- [Image Search](./examples/image_search/README.md): use the OpenAI CLIP model to generate vectors for image and text, store them in TiDB Vector, and search for similar images.\n- [LlamaIndex RAG with UI](./examples/llamaindex-tidb-vector-with-ui/README.md): use the LlamaIndex to build an [RAG(Retrieval-Augmented Generation)](https://docs.llamaindex.ai/en/latest/getting_started/concepts/) application.\n- [Chat with URL](./llamaindex-tidb-vector/README.md): use LlamaIndex to build an [RAG(Retrieval-Augmented Generation)](https://docs.llamaindex.ai/en/latest/getting_started/concepts/) application that can chat with a URL.\n- [GraphRAG](./examples/graphrag-demo/README.md): 20 lines code of using TiDB Serverless to build a Knowledge Graph based RAG application.\n- [GraphRAG Step by Step Tutorial](./examples/graphrag-step-by-step-tutorial/README.md): Step by step tutorial to build a Knowledge Graph based RAG application with Colab notebook. In this tutorial, you will learn how to extract knowledge from a text corpus, build a Knowledge Graph, store the Knowledge Graph in TiDB Serverless, and search from the Knowledge Graph.\n- [Vector Search Notebook with SQLAlchemy](https://colab.research.google.com/drive/1LuJn4mtKsjr3lHbzMa2RM-oroUvpy83y?usp=sharing): use [SQLAlchemy](https://www.sqlalchemy.org/) to interact with TiDB Serverless: connect db, index\u0026store data and then search vectors.\n- [Build RAG with Jina AI Embeddings](./examples/jina-ai-embeddings-demo/README.md): use Jina AI to generate embeddings for text data, store the embeddings in TiDB Vector Storage, and search for similar embeddings.\n- [Semantic Cache](./examples/semantic-cache/README.md): build a semantic cache with Jina AI and TiDB Vector.\n\nfor more examples, see the [examples](./examples) directory.\n\n## Contributing\n\nPlease feel free to reach out to the maintainers if you have any questions or need help with the project. Before contributing, please read the [CONTRIBUTING.md](./CONTRIBUTING.md) file.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpingcap%2Ftidb-vector-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpingcap%2Ftidb-vector-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpingcap%2Ftidb-vector-python/lists"}