https://github.com/bug-ops/pyhdb-rs
Rust-powered SAP HANA driver for Python with native Apache Arrow support. Zero-copy data transfer to Polars & Pandas. DB-API 2.0, async/await, connection pooling
https://github.com/bug-ops/pyhdb-rs
arrow async database dbapi driver hana pandas performance polars pyo3 python rust sap sap-hana zero-copy
Last synced: 24 days ago
JSON representation
Rust-powered SAP HANA driver for Python with native Apache Arrow support. Zero-copy data transfer to Polars & Pandas. DB-API 2.0, async/await, connection pooling
- Host: GitHub
- URL: https://github.com/bug-ops/pyhdb-rs
- Owner: bug-ops
- License: apache-2.0
- Created: 2026-01-11T00:44:03.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-01-13T04:29:20.000Z (4 months ago)
- Last Synced: 2026-01-16T15:52:05.130Z (4 months ago)
- Topics: arrow, async, database, dbapi, driver, hana, pandas, performance, polars, pyo3, python, rust, sap, sap-hana, zero-copy
- Language: Rust
- Homepage:
- Size: 335 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE-APACHE
- Security: SECURITY.md
Awesome Lists containing this project
README
# pyhdb-rs
**Unlock your SAP HANA data for modern analytics and AI workflows.**
Pure Rust toolkit — no SAP client installation required. Connect to HANA from anywhere, stream data directly into Polars, pandas, or DuckDB via Apache Arrow. Let AI assistants explore your schemas through a secure MCP interface.
[](https://github.com/bug-ops/pyhdb-rs/actions/workflows/ci.yml)
[](https://github.com/bug-ops/pyhdb-rs/actions/workflows/security.yml)
[](https://codecov.io/gh/bug-ops/pyhdb-rs)
[](https://codspeed.io/bug-ops/pyhdb-rs)
[](https://pypi.org/project/pyhdb_rs/)
[](https://crates.io/crates/hdbconnect-mcp)
[](LICENSE-APACHE)
## Why pyhdb-rs?
| Pain Point | Solution |
|------------|----------|
| SAP client installation & licensing | **Pure Rust** — no SAP dependencies, just `pip install` |
| Memory explodes with big datasets | **Zero-copy Arrow** streams data directly to DataFrames |
| Manual schema discovery for AI tools | **MCP server** gives Claude/Cline native HANA access |
| Complex ETL for analytics | **One-liner** to Polars LazyFrame or pandas |
## Quick Start
### For Data Engineers & Scientists
```bash
pip install pyhdb_rs
```
```python
from pyhdb_rs import ConnectionBuilder
import polars as pl
# Connect and extract — data flows directly to Polars without copies
conn = ConnectionBuilder.from_url("hdbsql://user:pass@hana:30015").build()
df = pl.from_arrow(conn.execute_arrow("""
SELECT material, plant, SUM(quantity) as total
FROM sapabap1.mard
GROUP BY material, plant
"""))
# Lazy evaluation for memory-efficient analytics
result = df.lazy().filter(pl.col("total") > 1000).collect()
```
> [!TIP]
> Data exports via [Apache Arrow](https://arrow.apache.org/) — the universal columnar format. Integrates with the entire Arrow ecosystem out of the box.
### Arrow Ecosystem Compatibility
Stream HANA data directly into any Arrow-compatible tool:
| Category | Tools |
|----------|-------|
| **DataFrames** | [Polars](https://pola.rs/), [pandas](https://pandas.pydata.org/), [Vaex](https://vaex.io/), [Dask](https://www.dask.org/) |
| **Query Engines** | [DuckDB](https://duckdb.org/), [DataFusion](https://datafusion.apache.org/), [ClickHouse](https://clickhouse.com/) |
| **ETL / Streaming** | [Apache Spark](https://spark.apache.org/), [Apache Flink](https://flink.apache.org/), [Kafka + Arrow](https://arrow.apache.org/) |
| **ML / AI** | [Ray](https://www.ray.io/), [Hugging Face Datasets](https://huggingface.co/docs/datasets/), [PyTorch](https://pytorch.org/) |
| **Data Lakes** | [Delta Lake](https://delta.io/), [Apache Iceberg](https://iceberg.apache.org/), [Lance](https://lancedb.github.io/lance/) |
| **Serialization** | [Parquet](https://parquet.apache.org/), [Arrow IPC/Feather](https://arrow.apache.org/docs/python/ipc.html) |
Zero-copy data transfer means no serialization overhead between HANA and your analytics stack.
### For AI-Assisted Development
```bash
cargo install hdbconnect-mcp
```
Add to Claude Desktop config:
```json
{
"mcpServers": {
"hana": {
"command": "hdbconnect-mcp",
"args": ["--url", "hdbsql://user:pass@hana:30015"]
}
}
}
```
Now ask Claude: *"Show me the top 10 customers by revenue from VBAK/VBAP"* — it queries HANA directly.
## Components
| Package | Use Case | Install |
|---------|----------|---------|
| [**pyhdb_rs**](python/README.md) | Python analytics, ETL pipelines, ML feature extraction | `pip install pyhdb_rs` |
| [**hdbconnect-mcp**](crates/hdbconnect-mcp/README.md) | AI assistants, natural language queries, schema exploration | `cargo install hdbconnect-mcp` |
| [**hdbconnect-arrow**](crates/hdbconnect-arrow/README.md) | Rust applications, custom Arrow integrations | `cargo add hdbconnect-arrow` |
## Python Driver
Full DB-API 2.0 compliance with native Arrow integration. No SAP client required — works anywhere Python runs.
Async ETL with connection pooling
```python
from pyhdb_rs.aio import ConnectionPoolBuilder
import polars as pl
pool = ConnectionPoolBuilder().url("hdbsql://user:pass@hana:30015").max_size(10).build()
async def extract_sales(region: str) -> pl.DataFrame:
async with pool.acquire() as conn:
return pl.from_arrow(await conn.execute_arrow(f"""
SELECT vbeln, erdat, netwr FROM sapabap1.vbak
WHERE region = '{region}' AND erdat >= '20240101'
"""))
# Parallel extraction across regions
results = await asyncio.gather(*[extract_sales(r) for r in ["US", "EU", "APAC"]])
```
pandas integration
```python
import pyarrow as pa
reader = conn.execute_arrow("SELECT * FROM sapabap1.mara WHERE mtart = 'FERT'")
df = pa.RecordBatchReader.from_stream(reader).read_all().to_pandas()
```
Streaming large datasets
```python
from pyhdb_rs import ArrowConfig
# Process 100M rows with constant memory
config = ArrowConfig(batch_size=50_000)
reader = conn.execute_arrow("SELECT * FROM sapabap1.mseg", config=config)
for batch in reader:
process_batch(batch) # Each batch: 50K rows as Arrow RecordBatch
```
**Full documentation:** [Python package README](python/README.md) — TLS configuration, HA clusters, transaction control, error handling.
## MCP Server for AI Agents
Production-ready server that exposes SAP HANA to Claude, Cline, Cursor, and any MCP-compatible assistant.
**Why it matters:** Instead of copy-pasting schemas or writing boilerplate queries, let AI discover and query your data directly — with guardrails.
**Security by design:**
- Read-only mode blocks DML/DDL by default
- Row limits prevent data exfiltration
- OIDC/JWT authentication for enterprise deployments
- Per-user cache isolation in multi-tenant setups
**Available tools:**
| Tool | What AI can do |
|------|----------------|
| `list_tables` | Explore schemas: *"What tables exist in SAPABAP1?"* |
| `describe_table` | Understand structure: *"Show me VBAK columns"* |
| `execute_sql` | Query data: *"Get top customers by revenue"* |
| `ping` | Verify connectivity |
**Full documentation:** [MCP server README](crates/hdbconnect-mcp/README.md) — HTTP transport, Kubernetes deployment, Prometheus metrics.
## Architecture
```mermaid
flowchart TB
subgraph apps["Your Applications"]
direction LR
subgraph python["Python Analytics"]
p1["ETL Pipelines"]
p2["ML Features"]
p3["BI Dashboards"]
end
subgraph ai["AI Assistants"]
m1["Claude Desktop"]
m2["Cline / Cursor"]
m3["Custom Agents"]
end
end
subgraph core["Rust Core"]
arrow["hdbconnect-arrow\nZero-copy HANA → Arrow"]
end
subgraph hana["SAP HANA"]
db[("BW/4HANA\nS/4HANA\nHANA Cloud")]
end
apps --> core
core --> hana
style apps fill:#e3f2fd
style core fill:#fff8e1
style hana fill:#e8f5e9
```
## Requirements
- **Python** 3.12+ (for pyhdb_rs)
- **Rust** 1.88+ (for building from source or MCP server)
## Resources
| Resource | Link |
|----------|------|
| Python API | [python/README.md](python/README.md) |
| MCP Server | [crates/hdbconnect-mcp/README.md](crates/hdbconnect-mcp/README.md) |
| Arrow Integration | [crates/hdbconnect-arrow/README.md](crates/hdbconnect-arrow/README.md) |
| Changelog | [CHANGELOG.md](CHANGELOG.md) |
| Contributing | [CONTRIBUTING.md](CONTRIBUTING.md) |
## License
Dual-licensed under [Apache-2.0](LICENSE-APACHE) or [MIT](LICENSE-MIT) at your option.