An open API service indexing awesome lists of open source software.

https://github.com/databendlabs/databend-aiserver

Databend AI Server extends data warehouse with AI-ready UDFs, seamlessly fusing object storage, embeddings, and SQL pipelines.
https://github.com/databendlabs/databend-aiserver

ai pipeline udf warehouse

Last synced: 2 months ago
JSON representation

Databend AI Server extends data warehouse with AI-ready UDFs, seamlessly fusing object storage, embeddings, and SQL pipelines.

Awesome Lists containing this project

README

          

# databend-aiserver

[Databend](https://github.com/databendlabs/databend) AI Server extends any data warehouse with AI-ready UDFs, seamlessly fusing object storage, embeddings, and SQL pipelines.

## Quickstart

```bash
uv sync
uv run databend-aiserver --port 8815
```

## AI Functions

| Function | Signature | Purpose | Output |
| :--- | :--- | :--- | :--- |
| **ai_list_files** | `(stage_location, max_files)` | List objects in a stage for inspection/sampling. | Table with file details (`path`, `size`, etc.) |
| **ai_embed_1024** | `(text)` | Generate 1024-dim embeddings (default: Qwen). | `VECTOR(1024)` |
| **ai_parse_document** | `(stage_location, path)` | Parse docs (PDF, DOCX, Images, etc.) to Markdown. | `VARIANT` (pages, metadata) |

## Usage

### 1. Register Functions in Databend

```sql
CREATE OR REPLACE FUNCTION ai_list_files(stage_location STAGE_LOCATION, pattern VARCHAR, max_files INT)
RETURNS TABLE (stage_name VARCHAR, path VARCHAR, uri VARCHAR, size UINT64, last_modified VARCHAR, etag VARCHAR, content_type VARCHAR)
LANGUAGE PYTHON HANDLER = 'ai_list_files' ADDRESS = '';

CREATE OR REPLACE FUNCTION ai_embed_1024(text VARCHAR)
RETURNS VECTOR(1024)
LANGUAGE PYTHON HANDLER = 'ai_embed_1024' ADDRESS = '';

CREATE OR REPLACE FUNCTION ai_parse_document(stage_location STAGE_LOCATION, file_path VARCHAR)
RETURNS VARIANT
LANGUAGE PYTHON HANDLER = 'ai_parse_document' ADDRESS = '';
```

### 2. Run Queries

```sql
-- Setup Stage
CREATE CONNECTION my_s3_conn STORAGE_TYPE = 's3' ACCESS_KEY_ID = '...' SECRET_ACCESS_KEY = '...';
CREATE STAGE docs_stage URL='s3://load/files/' CONNECTION = (CONNECTION_NAME = 'my_s3_conn');

-- Execute AI Functions
SELECT * FROM ai_list_files(@docs_stage, 50);
SELECT ai_embed_1024(doc_body) FROM docs_tbl;
SELECT ai_parse_document(@docs_stage, 'reports/q1.pdf');
```

## Development

```bash
# Run full test suite
uv run pytest
```

---

Built by the [Databend](https://github.com/databendlabs/databend) team.