{"id":30872857,"url":"https://github.com/pingcap/pytidb","last_synced_at":"2025-09-07T22:36:58.511Z","repository":{"id":284474685,"uuid":"951077405","full_name":"pingcap/pytidb","owner":"pingcap","description":"TiDB AI SDK: Unified Multi-Modal Data Platform for AI Apps \u0026 Agents - https://pingcap.github.io/ai/","archived":false,"fork":false,"pushed_at":"2025-08-26T10:08:07.000Z","size":1864,"stargazers_count":22,"open_issues_count":27,"forks_count":11,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-26T12:07:29.139Z","etag":null,"topics":["ai","embeddings","fulltext-search","hnsw","hybrid-search","multi-modal","rag","semantic-search","similarity-search","sql","tidb","vector-search"],"latest_commit_sha":null,"homepage":"https://pingcap.github.io/ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pingcap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-19T06:02:18.000Z","updated_at":"2025-08-25T10:34:40.000Z","dependencies_parsed_at":"2025-04-17T04:37:13.388Z","dependency_job_id":"b0590386-a8c0-4de5-a1b2-81b39491a438","html_url":"https://github.com/pingcap/pytidb","commit_stats":null,"previous_names":["mini256/pytidb","pingcap/pytidb"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/pingcap/pytidb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Fpytidb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Fpytidb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Fpytidb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Fpytidb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pingcap","download_url":"https://codeload.github.com/pingcap/pytidb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pingcap%2Fpytidb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274107649,"owners_count":25223451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","embeddings","fulltext-search","hnsw","hybrid-search","multi-modal","rag","semantic-search","similarity-search","sql","tidb","vector-search"],"created_at":"2025-09-07T22:34:54.380Z","updated_at":"2025-09-07T22:36:58.488Z","avatar_url":"https://github.com/pingcap.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003eTiDB Python AI SDK\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![Python Package Index](https://img.shields.io/pypi/v/pytidb.svg)](https://pypi.org/project/pytidb)\n[![Monthly PyPI Downloads](https://static.pepy.tech/badge/pytidb/month)](https://pepy.tech/projects/pytidb)\n[![Total PyPI Downloads](https://static.pepy.tech/badge/pytidb)](https://pepy.tech/projects/pytidb)\n\n\u003c/div\u003e\n\n\u003ch4 align=\"center\"\u003e\n  \u003ca href=\"https://github.com/pingcap/pytidb/blob/main/docs/quickstart.ipynb\"\u003eQuick Start\u003c/a\u003e\n  •\n  \u003ca href=\"https://pingcap.github.io/ai/\"\u003eDocumentation\u003c/a\u003e\n  •\n  \u003ca href=\"https://pingcap.github.io/ai/examples/\"\u003eExamples\u003c/a\u003e\n  •\n  \u003ca href=\"https://github.com/orgs/pingcap/projects/69/views/4\"\u003eRoadmap\u003c/a\u003e\n  •\n  \u003ca href=\"https://discord.com/invite/vYU9h56kAX\"\u003eDiscord\u003c/a\u003e\n  •\n  \u003ca href=\"https://github.com/pingcap/pytidb/issues\"\u003eReport Bug\u003c/a\u003e\n\u003c/h4\u003e\n\n## Introduction\n\n**Python SDK for TiDB AI**: A unified data platform empowering developers to build next-generation AI applications.\n\n- 🔍 **Unified Search Modes**: Vector · Full‑Text · Hybrid\n- 🎭 **Auto‑Embedding \u0026 Multi‑Modal Storage**: Support for text, images, and more \n- 🖼️ **Image Search Support**: Text‑to‑image and image‑to‑image retrieval capabilities \n- 🎯 **Advanced Filtering \u0026 Reranking**: Flexible filters with optional reranker models to fine-tune result relevance \n- 💱 **Transaction Support**: Full transaction management including commit/rollback to ensure consistency \n\n## Installation\n\n\u003e [!NOTE]\n\u003e This Python package is under rapid development and its API may change. It is recommended to use a **fixed version** when installing, e.g., `pytidb==0.0.12`.\n\n```bash\npip install pytidb\n\n# To use built-in embedding functions and rerankers:\npip install \"pytidb[models]\"\n\n# To convert query results to pandas DataFrame:\npip install pandas\n```\n\n\n## Connect to TiDB Cloud\n\nCreate a free TiDB cluster at [tidbcloud.com](https://tidbcloud.com/?utm_source=github\u0026utm_medium=referral\u0026utm_campaign=pytidb_readme).\n\n```python\nimport os\nfrom pytidb import TiDBClient\n\ntidb_client = TiDBClient.connect(\n    host=os.getenv(\"TIDB_HOST\"),\n    port=int(os.getenv(\"TIDB_PORT\")),\n    username=os.getenv(\"TIDB_USERNAME\"),\n    password=os.getenv(\"TIDB_PASSWORD\"),\n    database=os.getenv(\"TIDB_DATABASE\"),\n    ensure_db=True,\n)\n```\n\n## Highlights\n\n### 🤖 Automatic Embedding\n\nPyTiDB automatically embeds text fields (e.g., `text`) and stores the vector embedding in a vector field (e.g., `text_vec`).\n\n**Create a table with an embedding function:**\n\n```python\nfrom pytidb.schema import TableModel, Field, FullTextField\nfrom pytidb.embeddings import EmbeddingFunction\n\n# Set API key for embedding provider.\ntidb_client.configure_embedding_provider(\"openai\", api_key=os.getenv(\"OPENAI_API_KEY\"))\n\nclass Chunk(TableModel):\n    __tablename__ = \"chunks\"\n\n    id: int = Field(primary_key=True)\n    text: str = FullTextField()\n    text_vec: list[float] = EmbeddingFunction(\n        \"openai/text-embedding-3-small\"\n    ).VectorField(source_field=\"text\")  # 👈 Defines the vector field.\n    user_id: int = Field()\n\ntable = tidb_client.create_table(schema=Chunk, if_exists=\"skip\")\n```\n\n**Bulk insert data:**\n\n```python\ntable.bulk_insert([\n    Chunk(id=2, text=\"bar\", user_id=2),   # 👈 The text field is embedded and saved to text_vec automatically.\n    Chunk(id=3, text=\"baz\", user_id=3),\n    Chunk(id=4, text=\"qux\", user_id=4),\n])\n```\n\n### 🔍 Search\n\n**Vector Search**\n\nVector search finds the most relevant records based on **semantic similarity**, so you don't need to include all keywords explicitly in your query.\n\n```python\ndf = (\n  table.search(\"\u003cquery\u003e\")  # 👈 The query is embedded automatically.\n    .filter({\"user_id\": 2})\n    .limit(2)\n    .to_list()\n)\n# Output: A list of dicts.\n```\n\nSee the [Vector Search example](https://github.com/pingcap/pytidb/blob/main/examples/vector_search) for more details.\n\n**Full-text Search**\n\nFull-text search tokenizes the query and finds the most relevant records by matching exact keywords.\n\n```python\ndf = (\n  table.search(\"\u003cquery\u003e\", search_type=\"fulltext\")\n    .limit(2)\n    .to_pydantic()\n)\n# Output: A list of pydantic model instances.\n```\n\nSee the [Full-text Search example](https://github.com/pingcap/pytidb/blob/main/examples/fulltext_search) for more details.\n\n**Hybrid Search**\n\nHybrid search combines **exact matching** from full-text search with **semantic understanding** from vector search, delivering more relevant and reliable results.\n\n```python\ndf = (\n  table.search(\"\u003cquery\u003e\", search_type=\"hybrid\")\n    .limit(2)\n    .to_pandas()\n)\n# Output: A pandas DataFrame.\n```\n\nSee the [Hybrid Search example](https://github.com/pingcap/pytidb/blob/main/examples/hybrid_search) for more details.\n\n**Image Search**\n\nImage search lets you find visually similar images using natural language descriptions or another image as a reference.\n\n```python\nfrom PIL import Image\nfrom pytidb.schema import TableModel, Field\nfrom pytidb.embeddings import EmbeddingFunction\n\n# Define a multi-modal embedding model.\njina_embed_fn = EmbeddingFunction(\"jina_ai/jina-embeddings-v4\")  # Using multi-modal embedding model.\n\nclass Pet(TableModel):\n    __tablename__ = \"pets\"\n    id: int = Field(primary_key=True)\n    image_uri: str = Field()\n    image_vec: list[float] = jina_embed_fn.VectorField(\n        source_field=\"image_uri\",\n        source_type=\"image\"\n    )\n\ntable = tidb_client.create_table(schema=Pet, if_exists=\"skip\")\n\n# Insert sample images ...\ntable.insert(Pet(image_uri=\"path/to/shiba_inu_14.jpg\"))\n\n# Search for images using natural language\nresults = table.search(\"shiba inu dog\").limit(1).to_list()\n\n# Search for images using an image ...\nquery_image = Image.open(\"shiba_inu_15.jpg\")\nresults = table.search(query_image).limit(1).to_pydantic()\n```\n\nSee the [Image Search example](https://github.com/pingcap/pytidb/blob/main/examples/image_search) for more details.\n\n#### Advanced Filtering\n\nPyTiDB supports a variety of operators for flexible filtering:\n\n| Operator | Description           | Example                                    |\n| -------- | --------------------- | ------------------------------------------ |\n| `$eq`    | Equal to              | `{\"field\": {\"$eq\": \"hello\"}}`              |\n| `$gt`    | Greater than          | `{\"field\": {\"$gt\": 1}}`                    |\n| `$gte`   | Greater than or equal | `{\"field\": {\"$gte\": 1}}`                   |\n| `$lt`    | Less than             | `{\"field\": {\"$lt\": 1}}`                    |\n| `$lte`   | Less than or equal    | `{\"field\": {\"$lte\": 1}}`                   |\n| `$in`    | In array              | `{\"field\": {\"$in\": [1, 2, 3]}}`            |\n| `$nin`   | Not in array          | `{\"field\": {\"$nin\": [1, 2, 3]}}`           |\n| `$and`   | Logical AND           | `{\"$and\": [{\"field1\": 1}, {\"field2\": 2}]}` |\n| `$or`    | Logical OR            | `{\"$or\": [{\"field1\": 1}, {\"field2\": 2}]}`  |\n\n### ⛓ Join Structured and Unstructured Data\n\n```python\nfrom pytidb import Session\nfrom pytidb.sql import select\n\n# Create a table to store user data:\nclass User(TableModel):\n    __tablename__ = \"users\"\n    id: int = Field(primary_key=True)\n    name: str = Field(max_length=20)\n\nwith Session(engine) as session:\n    query = (\n        select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == \"Alice\")\n    )\n    chunks = session.exec(query).all()\n\n[(c.id, c.text, c.user_id) for c in chunks]\n```\n\n### 💱 Transaction Support\n\nPyTiDB supports transaction management, helping you avoid race conditions and ensure data consistency.\n\n```python\nwith tidb_client.session() as session:\n    initial_total_balance = tidb_client.query(\"SELECT SUM(balance) FROM players\").scalar()\n\n    # Transfer 10 coins from player 1 to player 2\n    tidb_client.execute(\"UPDATE players SET balance = balance - 10 WHERE id = 1\")\n    tidb_client.execute(\"UPDATE players SET balance = balance + 10 WHERE id = 2\")\n\n    session.commit()\n    # or session.rollback()\n\n    final_total_balance = tidb_client.query(\"SELECT SUM(balance) FROM players\").scalar()\n    assert final_total_balance == initial_total_balance\n```\n\n\n## Extensions\n\n\n- 🔌 [Built-in MCP support](https://pingcap.github.io/ai/integrations/mcp)\n\n\u003e [!TIP]\n\u003e Click the button below to install **TiDB MCP Server** in Cursor. Then, confirm by clicking **Install** when prompted.\n\u003e\n\u003e [![Install TiDB MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=TiDB\u0026config=eyJjb21tYW5kIjoidXZ4IC0tZnJvbSBweXRpZGJbbWNwXSB0aWRiLW1jcC1zZXJ2ZXIiLCJlbnYiOnsiVElEQl9IT1NUIjoibG9jYWxob3N0IiwiVElEQl9QT1JUIjoiNDAwMCIsIlRJREJfVVNFUk5BTUUiOiJyb290IiwiVElEQl9QQVNTV09SRCI6IiIsIlRJREJfREFUQUJBU0UiOiJ0ZXN0In19)\n","funding_links":[],"categories":["Databases","Production-Ready Servers","官方 MCP 服务器列表","📦 Other"],"sub_categories":["SQL Databases"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpingcap%2Fpytidb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpingcap%2Fpytidb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpingcap%2Fpytidb/lists"}