https://github.com/databendlabs/databend-aiserver
Databend AI Server extends data warehouse with AI-ready UDFs, seamlessly fusing object storage, embeddings, and SQL pipelines.
https://github.com/databendlabs/databend-aiserver
ai pipeline udf warehouse
Last synced: 2 months ago
JSON representation
Databend AI Server extends data warehouse with AI-ready UDFs, seamlessly fusing object storage, embeddings, and SQL pipelines.
- Host: GitHub
- URL: https://github.com/databendlabs/databend-aiserver
- Owner: databendlabs
- License: apache-2.0
- Created: 2025-10-23T01:51:01.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-28T07:22:45.000Z (7 months ago)
- Last Synced: 2026-04-13T07:49:21.993Z (2 months ago)
- Topics: ai, pipeline, udf, warehouse
- Language: Python
- Homepage:
- Size: 3.83 MB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# databend-aiserver
[Databend](https://github.com/databendlabs/databend) AI Server extends any data warehouse with AI-ready UDFs, seamlessly fusing object storage, embeddings, and SQL pipelines.
## Quickstart
```bash
uv sync
uv run databend-aiserver --port 8815
```
## AI Functions
| Function | Signature | Purpose | Output |
| :--- | :--- | :--- | :--- |
| **ai_list_files** | `(stage_location, max_files)` | List objects in a stage for inspection/sampling. | Table with file details (`path`, `size`, etc.) |
| **ai_embed_1024** | `(text)` | Generate 1024-dim embeddings (default: Qwen). | `VECTOR(1024)` |
| **ai_parse_document** | `(stage_location, path)` | Parse docs (PDF, DOCX, Images, etc.) to Markdown. | `VARIANT` (pages, metadata) |
## Usage
### 1. Register Functions in Databend
```sql
CREATE OR REPLACE FUNCTION ai_list_files(stage_location STAGE_LOCATION, pattern VARCHAR, max_files INT)
RETURNS TABLE (stage_name VARCHAR, path VARCHAR, uri VARCHAR, size UINT64, last_modified VARCHAR, etag VARCHAR, content_type VARCHAR)
LANGUAGE PYTHON HANDLER = 'ai_list_files' ADDRESS = '';
CREATE OR REPLACE FUNCTION ai_embed_1024(text VARCHAR)
RETURNS VECTOR(1024)
LANGUAGE PYTHON HANDLER = 'ai_embed_1024' ADDRESS = '';
CREATE OR REPLACE FUNCTION ai_parse_document(stage_location STAGE_LOCATION, file_path VARCHAR)
RETURNS VARIANT
LANGUAGE PYTHON HANDLER = 'ai_parse_document' ADDRESS = '';
```
### 2. Run Queries
```sql
-- Setup Stage
CREATE CONNECTION my_s3_conn STORAGE_TYPE = 's3' ACCESS_KEY_ID = '...' SECRET_ACCESS_KEY = '...';
CREATE STAGE docs_stage URL='s3://load/files/' CONNECTION = (CONNECTION_NAME = 'my_s3_conn');
-- Execute AI Functions
SELECT * FROM ai_list_files(@docs_stage, 50);
SELECT ai_embed_1024(doc_body) FROM docs_tbl;
SELECT ai_parse_document(@docs_stage, 'reports/q1.pdf');
```
## Development
```bash
# Run full test suite
uv run pytest
```
---
Built by the [Databend](https://github.com/databendlabs/databend) team.