https://github.com/bug-ops/pyhdb-rs

Rust-powered SAP HANA driver for Python with native Apache Arrow support. Zero-copy data transfer to Polars & Pandas. DB-API 2.0, async/await, connection pooling
https://github.com/bug-ops/pyhdb-rs
arrow async database dbapi driver hana pandas performance polars pyo3 python rust sap sap-hana zero-copy
Last synced: 14 days ago
JSON representation
Rust-powered SAP HANA driver for Python with native Apache Arrow support. Zero-copy data transfer to Polars & Pandas. DB-API 2.0, async/await, connection pooling
Host: GitHub
URL: https://github.com/bug-ops/pyhdb-rs
Owner: bug-ops
License: apache-2.0
Created: 2026-01-11T00:44:03.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2026-01-13T04:29:20.000Z (about 1 month ago)
Last Synced: 2026-01-16T15:52:05.130Z (27 days ago)
Topics: arrow, async, database, dbapi, driver, hana, pandas, performance, polars, pyo3, python, rust, sap, sap-hana, zero-copy
Language: Rust
Homepage:
Size: 335 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE-APACHE
- Security: SECURITY.md
Awesome Lists containing this project

README

          # pyhdb-rs

High-performance Python driver for SAP HANA with native Arrow support.

[![CI](https://github.com/bug-ops/pyhdb-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/bug-ops/pyhdb-rs/actions/workflows/ci.yml)

[![Security](https://github.com/bug-ops/pyhdb-rs/actions/workflows/security.yml/badge.svg)](https://github.com/bug-ops/pyhdb-rs/actions/workflows/security.yml)

[![codecov](https://codecov.io/gh/bug-ops/pyhdb-rs/graph/badge.svg?token=75RR61N6FI)](https://codecov.io/gh/bug-ops/pyhdb-rs)

[![Crates.io](https://img.shields.io/crates/v/hdbconnect-arrow.svg)](https://crates.io/crates/hdbconnect-arrow)

[![docs.rs](https://img.shields.io/docsrs/hdbconnect-arrow)](https://docs.rs/hdbconnect-arrow)

[![PyPI](https://img.shields.io/pypi/v/pyhdb_rs.svg)](https://pypi.org/project/pyhdb_rs/)

[![Python](https://img.shields.io/pypi/pyversions/pyhdb_rs)](https://pypi.org/project/pyhdb_rs)

[![MSRV](https://img.shields.io/badge/MSRV-1.88-blue)](https://github.com/bug-ops/pyhdb-rs)

[![License](https://img.shields.io/badge/license-Apache--2.0%20OR%20MIT-blue.svg)](LICENSE-APACHE)

## Features

- Full DB-API 2.0 (PEP 249) compliance

- Zero-copy Arrow data transfer via PyCapsule Interface

- Native Polars/pandas integration

- Async/await support with connection pooling

- Built with Rust and PyO3 for maximum performance

## Installation

```bash

uv pip install pyhdb_rs

```

With optional dependencies:

```bash

uv pip install pyhdb_rs[polars]    # Polars integration

uv pip install pyhdb_rs[pandas]    # pandas + PyArrow

uv pip install pyhdb_rs[async]     # Async support

uv pip install pyhdb_rs[all]       # All integrations

```

> [!IMPORTANT]

> Requires Python 3.12 or later.

Platform support

| Platform | Architectures |

|----------|---------------|

| Linux (glibc) | x86_64, aarch64 |

| Linux (musl) | x86_64, aarch64 |

| macOS | x86_64, aarch64 |

| Windows | x86_64 |

From source

```bash

git clone https://github.com/bug-ops/pyhdb-rs.git

cd pyhdb-rs/python

uv venv

source .venv/bin/activate  # On Windows: .venv\Scripts\activate

uv pip install maturin

maturin develop --release

```

## Quick start

### DB-API 2.0 usage

```python

import pyhdb_rs

conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")

with conn.cursor() as cursor:

    cursor.execute("SELECT * FROM CUSTOMERS WHERE IS_ACTIVE = ?", [True])

    rows = cursor.fetchall()

    for row in rows:

        print(row)

    cursor.execute("SELECT CUSTOMER_NAME, EMAIL_ADDRESS FROM CUSTOMERS")

    for name, email in cursor:

        print(f"{name}: {email}")

conn.close()

```

## Builder API

pyhdb-rs provides a builder pattern for flexible connection configuration.

### Basic connection

```python

from pyhdb_rs import ConnectionBuilder

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .port(30015)

    .credentials("SYSTEM", "password")

    .database("SYSTEMDB")

    .build())

with conn.cursor() as cursor:

    cursor.execute("SELECT * FROM DUMMY")

    print(cursor.fetchone())

conn.close()

```

### With TLS configuration

```python

from pyhdb_rs import ConnectionBuilder, TlsConfig

# System root certificates

tls = TlsConfig.with_system_roots()

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

> [!TIP]

> Use `TlsConfig.from_directory("/path/to/certs")` for custom CA certificates.

Builder API: From URL with overrides

```python

from pyhdb_rs import ConnectionBuilder, TlsConfig

# Start with URL, override specific settings

conn = (ConnectionBuilder.from_url("hdbsql://user:pass@host:30015")

    .tls(TlsConfig.with_system_roots())

    .autocommit(True)

    .build())

```

Builder API: Async connections

```python

import asyncio

from pyhdb_rs.aio import AsyncConnectionBuilder

from pyhdb_rs import TlsConfig

async def main():

    conn = await (AsyncConnectionBuilder()

        .host("hana.example.com")

        .credentials("SYSTEM", "password")

        .tls(TlsConfig.with_system_roots())

        .autocommit(True)

        .build())

    async with conn:

        cursor = conn.cursor()

        await cursor.execute("SELECT * FROM DUMMY")

        print(await cursor.fetchone())

asyncio.run(main())

```

Polars Integration Examples

```python

import pyhdb_rs.polars as hdb

df = hdb.read_hana(

    """

    SELECT

        PRODUCT_CATEGORY,

        FISCAL_YEAR,

        SUM(NET_AMOUNT) AS TOTAL_REVENUE,

        COUNT(DISTINCT ORDER_ID) AS ORDER_COUNT,

        AVG(QUANTITY) AS AVG_QUANTITY

    FROM SALES_ITEMS

    WHERE FISCAL_YEAR BETWEEN 2024 AND 2026

        AND SALES_REGION IN ('EMEA', 'AMERICAS')

    GROUP BY PRODUCT_CATEGORY, FISCAL_YEAR

    ORDER BY TOTAL_REVENUE DESC

    """,

    "hdbsql://USER:PASSWORD@HOST:39017"

)

print(df.head())

```

> [!TIP]

> Use `execute_arrow()` with Polars for best performance. Data flows directly from HANA to Polars without intermediate copies.

Or using the connection object:

```python

import pyhdb_rs

import polars as pl

conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")

# Get as Polars DataFrame (zero-copy via Arrow)

reader = conn.execute_arrow(

    "SELECT PRODUCT_ID, PRODUCT_NAME, CATEGORY, UNIT_PRICE FROM PRODUCTS WHERE IS_ACTIVE = 1"

)

df = pl.from_arrow(reader)

# For parameterized queries, use two-step pattern

cursor = conn.cursor()

cursor.execute(

    """SELECT p.PRODUCT_NAME, p.UNIT_PRICE, s.STOCK_QUANTITY

       FROM PRODUCTS p

       JOIN STOCK s ON p.PRODUCT_ID = s.PRODUCT_ID

       WHERE p.CATEGORY = ? AND s.STOCK_QUANTITY > ?""",

    ["Electronics", 10]

)

df = pl.from_arrow(cursor.fetch_arrow())

# Stream large datasets batch-by-batch

reader = conn.execute_arrow(

    "SELECT ORDER_ID, CUSTOMER_ID, ORDER_DATE, TOTAL_AMOUNT FROM SALES_ORDERS WHERE ORDER_DATE >= '2024-01-01'"

)

for batch in reader:

    process_batch(batch)

conn.close()

```

### pandas integration

```python

import pyhdb_rs.pandas as hdb

df = hdb.read_hana(

    """SELECT ORDER_ID, CUSTOMER_NAME, PRODUCT_NAME, QUANTITY, NET_AMOUNT

       FROM SALES_ITEMS

       WHERE ORDER_STATUS = 'COMPLETED' AND ORDER_DATE >= ADD_MONTHS(CURRENT_DATE, -12)""",

    "hdbsql://USER:PASSWORD@HOST:39017"

)

print(df.head())

```

### Lazy evaluation with Polars

```python

import pyhdb_rs.polars as hdb

import polars as pl

# scan_hana() returns a LazyFrame - query executes on .collect()

lf = hdb.scan_hana(

    "SELECT ORDER_ID, CUSTOMER_NAME, PRODUCT_CATEGORY, NET_AMOUNT, ORDER_DATE FROM SALES_ITEMS WHERE YEAR(ORDER_DATE) = 2025",

    "hdbsql://USER:PASSWORD@HOST:39017"

)

result = lf.filter(pl.col("NET_AMOUNT") > 1000).select(["CUSTOMER_NAME", "PRODUCT_CATEGORY", "NET_AMOUNT"]).collect()

```

> [!TIP]

> Use `scan_hana()` for lazy evaluation when you need to apply filters or transformations before materializing data.

TLS/SSL Configuration (5 methods)

pyhdb-rs provides flexible TLS configuration via `TlsConfig` for secure connections.

### TLS Configuration Methods

`TlsConfig` provides five factory methods for different certificate sources:

#### 1. From Directory (recommended for production)

Load all `.pem`, `.crt`, and `.cer` files from a directory:

```python

from pyhdb_rs import TlsConfig, ConnectionBuilder

tls = TlsConfig.from_directory("/etc/hana/certs")

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

> [!TIP]

> This is the recommended approach for production deployments. Place all CA certificates in a single directory.

#### 2. From Environment Variable

Load certificate from an environment variable:

```python

import os

from pyhdb_rs import TlsConfig, ConnectionBuilder

# Set certificate in environment

os.environ["HANA_CA_CERT"] = """-----BEGIN CERTIFICATE-----

MIIDdzCCAl+gAwIBAgIEAgAAuTANBgkqhkiG9w0BAQUFADBaMQswCQYDVQQGEwJJ

...

-----END CERTIFICATE-----"""

tls = TlsConfig.from_environment("HANA_CA_CERT")

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

> [!NOTE]

> Useful for containerized deployments where certificates are injected via environment variables.

#### 3. From Certificate String

Provide PEM-encoded certificate directly:

```python

from pyhdb_rs import TlsConfig, ConnectionBuilder

with open("/path/to/ca-bundle.pem") as f:

    cert_pem = f.read()

tls = TlsConfig.from_certificate(cert_pem)

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

#### 4. System Root Certificates

Use Mozilla's root certificates (bundled):

```python

from pyhdb_rs import TlsConfig, ConnectionBuilder

tls = TlsConfig.with_system_roots()

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

> [!TIP]

> Best choice when your HANA server uses a certificate signed by a well-known CA (e.g., Let's Encrypt, DigiCert).

#### 5. Insecure (development only)

Skip certificate verification:

```python

from pyhdb_rs import TlsConfig, ConnectionBuilder

tls = TlsConfig.insecure()

conn = (ConnectionBuilder()

    .host("hana-dev.internal")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

> [!CAUTION]

> `TlsConfig.insecure()` disables server certificate verification completely. **NEVER use in production.** This makes your connection vulnerable to man-in-the-middle attacks.

### URL Scheme for TLS

The `hdbsqls://` scheme automatically enables TLS with system roots:

```python

from pyhdb_rs import ConnectionBuilder

# Equivalent to using TlsConfig.with_system_roots()

conn = ConnectionBuilder.from_url("hdbsqls://user:pass@host:30015").build()

# Override with custom TLS config

conn = (ConnectionBuilder.from_url("hdbsqls://user:pass@host:30015")

    .tls(TlsConfig.from_directory("/custom/certs"))

    .build())

```

Cursor Holdability (Transaction Control)

Control result set behavior across transaction boundaries with `CursorHoldability`. This determines whether cursors remain open after `commit()` or `rollback()` operations.

```python

from pyhdb_rs import ConnectionBuilder, CursorHoldability

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .cursor_holdability(CursorHoldability.CommitAndRollback)

    .build())

conn.set_autocommit(False)

with conn.cursor() as cur:

    cur.execute("SELECT * FROM large_table")

    rows = cur.fetchmany(1000)

    # Process first batch

    process_batch(rows)

    conn.commit()  # Cursor remains open with CommitAndRollback

    # Continue reading from the same result set

    more_rows = cur.fetchmany(1000)

```

### Holdability Variants

| Variant | Behavior |

|---------|----------|

| `CursorHoldability.None` | Cursor closed on commit **and** rollback (default) |

| `CursorHoldability.Commit` | Cursor held across commits, closed on rollback |

| `CursorHoldability.Rollback` | Cursor held across rollbacks, closed on commit |

| `CursorHoldability.CommitAndRollback` | Cursor held across both operations |

> [!NOTE]

> Use `CommitAndRollback` when you need to iterate over large result sets while performing intermediate commits to free locks or manage transaction size.

High Availability & Scale-Out Deployments

Configure network groups for HANA HA and Scale-Out deployments to control connection routing.

### Network Group Configuration

```python

from pyhdb_rs import ConnectionBuilder

conn = (ConnectionBuilder()

    .host("hana-ha-cluster.example.com")

    .port(30015)

    .credentials("SYSTEM", "password")

    .network_group("ha-primary")

    .build())

```

> [!IMPORTANT]

> Network groups are essential for proper routing in multi-node HANA environments. They determine which network interface the driver uses when multiple options are available.

### Use Cases

**1. High Availability Clusters**

Direct connections to specific nodes in an HA setup:

```python

# Connect to primary node network

conn_primary = (ConnectionBuilder()

    .host("hana-ha.example.com")

    .credentials("SYSTEM", "password")

    .network_group("internal")

    .build())

# Connect to secondary node network

conn_secondary = (ConnectionBuilder()

    .host("hana-ha.example.com")

    .credentials("SYSTEM", "password")

    .network_group("external")

    .build())

```

**2. Scale-Out Systems**

Route to specific network groups in scale-out configurations:

```python

from pyhdb_rs import ConnectionBuilder

# Connect via data network

conn = (ConnectionBuilder()

    .host("hana-scaleout.example.com")

    .credentials("SYSTEM", "password")

    .network_group("data-network")

    .build())

```

### Async Connection Pools

Combine network groups with connection pooling for production deployments:

```python

from pyhdb_rs.aio import ConnectionPoolBuilder

from pyhdb_rs import TlsConfig

pool = (ConnectionPoolBuilder()

    .url("hdbsql://user:pass@host:30015")

    .network_group("production")

    .max_size(20)

    .tls(TlsConfig.with_system_roots())

    .build())

async with pool.acquire() as conn:

    reader = await conn.execute_arrow("SELECT * FROM large_table")

    df = pl.from_arrow(reader)

```

Async/Await Support

pyhdb-rs supports async/await operations for non-blocking database access.

> [!NOTE]

> Async support requires the `async` extra: `uv pip install pyhdb_rs[async]`

> [!WARNING]

> **Async API Memory Behavior**: The async `execute_arrow()` loads ALL rows into

> memory before streaming batches. For large datasets (>100K rows), use the sync

> API for true streaming with O(batch_size) memory usage.

### Basic async usage

```python

import asyncio

import polars as pl

from pyhdb_rs.aio import connect

async def main():

    async with await connect("hdbsql://USER:PASSWORD@HOST:30015") as conn:

        reader = await conn.execute_arrow(

            """SELECT PRODUCT_NAME, SUM(QUANTITY) AS TOTAL_SOLD, SUM(NET_AMOUNT) AS REVENUE

               FROM SALES_ITEMS

               WHERE ORDER_DATE >= '2025-01-01'

               GROUP BY PRODUCT_NAME

               ORDER BY REVENUE DESC

               LIMIT 10"""

        )

        df = pl.from_arrow(reader)

        print(df)

asyncio.run(main())

```

Connection pooling

### Using ConnectionPoolBuilder (recommended)

```python

import asyncio

import polars as pl

from pyhdb_rs.aio import ConnectionPoolBuilder

from pyhdb_rs import TlsConfig

async def main():

    # Builder pattern for pools

    pool = (ConnectionPoolBuilder()

        .url("hdbsql://USER:PASSWORD@HOST:30015")

        .max_size(10)

        .tls(TlsConfig.with_system_roots())

        .network_group("production")

        .build())

    async with pool.acquire() as conn:

        reader = await conn.execute_arrow(

            """SELECT CUSTOMER_ID, COUNT(ORDER_ID) AS ORDER_COUNT, SUM(TOTAL_AMOUNT) AS TOTAL_SPENT

               FROM SALES_ORDERS

               WHERE ORDER_DATE >= '2025-01-01' AND ORDER_STATUS = 'COMPLETED'

               GROUP BY CUSTOMER_ID

               HAVING SUM(TOTAL_AMOUNT) > 10000"""

        )

        df = pl.from_arrow(reader)

        print(df)

    status = pool.status

    print(f"Pool size: {status.size}, available: {status.available}")

asyncio.run(main())

```

### Using create_pool (legacy)

```python

import asyncio

from pyhdb_rs.aio import create_pool

async def main():

    pool = create_pool(

        "hdbsql://USER:PASSWORD@HOST:30015",

        max_size=10,

        connection_timeout=30

    )

    async with pool.acquire() as conn:

        # Use connection

        pass

asyncio.run(main())

```

Concurrent queries

```python

import asyncio

import polars as pl

from pyhdb_rs.aio import create_pool

async def fetch_sales_by_region(pool, region: str):

    async with pool.acquire() as conn:

        reader = await conn.execute_arrow(

            f"""SELECT PRODUCT_CATEGORY, SUM(NET_AMOUNT) AS REVENUE

                FROM SALES_ITEMS

                WHERE REGION = '{region}' AND FISCAL_YEAR = 2025

                GROUP BY PRODUCT_CATEGORY

                ORDER BY REVENUE DESC"""

        )

        return pl.from_arrow(reader)

async def main():

    pool = create_pool("hdbsql://USER:PASSWORD@HOST:30015", max_size=5)

    # Run multiple queries concurrently for different regions

    results = await asyncio.gather(

        fetch_sales_by_region(pool, "EMEA"),

        fetch_sales_by_region(pool, "AMERICAS"),

        fetch_sales_by_region(pool, "APAC"),

    )

    emea_df, americas_df, apac_df = results

    print(f"EMEA: {len(emea_df)} categories, AMERICAS: {len(americas_df)} categories")

asyncio.run(main())

```

Migration Guide: v0.2.x → v0.3.0

### Breaking Changes

#### 1. Removed `statement_cache_size` parameter

The `statement_cache_size` parameter has been removed from the async `connect()` function. Statement cache size is now fixed at 100 (the default).

**Before (v0.2.x):**

```python

from pyhdb_rs.aio import connect

conn = await connect("hdbsql://user:pass@host:30015", statement_cache_size=100)

```

**After (v0.3.0):**

```python

from pyhdb_rs.aio import connect

# statement_cache_size is always 100

conn = await connect("hdbsql://user:pass@host:30015")

```

> [!NOTE]

> This change only affects async connections. Sync connections never had this parameter.

### New Features

#### Builder API (Recommended)

While the old connection methods still work, the new builder API provides more flexibility:

**Old style (still supported):**

```python

import pyhdb_rs

conn = pyhdb_rs.connect("hdbsql://user:pass@host:30015")

```

**New style (recommended):**

```python

from pyhdb_rs import ConnectionBuilder, TlsConfig

conn = (ConnectionBuilder()

    .host("host")

    .port(30015)

    .credentials("user", "pass")

    .tls(TlsConfig.with_system_roots())

    .build())

```

**Benefits of the builder pattern:**

- Type-safe configuration

- More discoverable API

- Better IDE autocomplete

- Fine-grained control over TLS, cursor holdability, network groups

#### TlsConfig

v0.3.0 introduces `TlsConfig` for flexible TLS configuration:

```python

from pyhdb_rs import ConnectionBuilder, TlsConfig

# Multiple ways to configure TLS

tls = TlsConfig.from_directory("/etc/hana/certs")

tls = TlsConfig.from_environment("HANA_CA_CERT")

tls = TlsConfig.from_certificate(cert_pem)

tls = TlsConfig.with_system_roots()

tls = TlsConfig.insecure()  # Development only!

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .tls(tls)

    .build())

```

#### CursorHoldability

Control cursor behavior across transactions:

```python

from pyhdb_rs import ConnectionBuilder, CursorHoldability

conn = (ConnectionBuilder()

    .host("hana.example.com")

    .credentials("SYSTEM", "password")

    .cursor_holdability(CursorHoldability.CommitAndRollback)

    .build())

```

#### Network Groups

For HA and Scale-Out deployments:

```python

from pyhdb_rs import ConnectionBuilder

conn = (ConnectionBuilder()

    .host("hana-ha.example.com")

    .credentials("SYSTEM", "password")

    .network_group("production")

    .build())

```

#### ConnectionPoolBuilder

The async pool API now has a builder:

**Old style (still supported):**

```python

from pyhdb_rs.aio import create_pool

pool = create_pool("hdbsql://user:pass@host:30015", max_size=10)

```

**New style (recommended):**

```python

from pyhdb_rs.aio import ConnectionPoolBuilder

from pyhdb_rs import TlsConfig

pool = (ConnectionPoolBuilder()

    .url("hdbsql://user:pass@host:30015")

    .max_size(10)

    .tls(TlsConfig.with_system_roots())

    .network_group("production")

    .build())

```

### Upgrade Checklist

- [ ] Remove `statement_cache_size` from async `connect()` calls

- [ ] Consider migrating to `ConnectionBuilder` for better configuration

- [ ] Use `TlsConfig` for explicit TLS configuration

- [ ] Add `network_group` if using HANA HA/Scale-Out

- [ ] Use `ConnectionPoolBuilder` for new pool configurations

API Patterns & Best Practices

### Arrow RecordBatchReader

`execute_arrow()` returns a `RecordBatchReader` that implements the Arrow PyCapsule Interface (`__arrow_c_stream__`):

```python

# Pattern 1: Direct conversion to Polars (recommended)

reader = conn.execute_arrow(

    "SELECT CUSTOMER_ID, CUSTOMER_NAME, TOTAL_ORDERS FROM CUSTOMER_SUMMARY WHERE ACTIVE_FLAG = 1"

)

df = pl.from_arrow(reader)  # Zero-copy via PyCapsule

# Pattern 2: Convert to PyArrow Table first

reader = conn.execute_arrow(

    "SELECT ORDER_ID, ORDER_DATE, TOTAL_AMOUNT FROM SALES_ORDERS WHERE ORDER_DATE >= '2025-01-01'"

)

pa_reader = pa.RecordBatchReader.from_stream(reader)

table = pa_reader.read_all()

# Pattern 3: Stream large datasets

reader = conn.execute_arrow(

    "SELECT TRANSACTION_ID, CUSTOMER_ID, AMOUNT, TRANSACTION_DATE FROM TRANSACTION_HISTORY WHERE YEAR(TRANSACTION_DATE) = 2025",

    batch_size=10000

)

for batch in reader:

    process_batch(batch)  # Each batch is a RecordBatch

```

> [!NOTE]

> The reader is consumed after use (single-pass iterator). You cannot read from it twice.

### Parameterized Queries with Arrow

`execute_arrow()` does NOT support query parameters. For parameterized queries, use the two-step pattern:

```python

# Two-step: execute() then fetch_arrow()

cursor = conn.cursor()

cursor.execute(

    """SELECT o.ORDER_ID, o.ORDER_DATE, c.CUSTOMER_NAME, o.TOTAL_AMOUNT

       FROM SALES_ORDERS o

       JOIN CUSTOMERS c ON o.CUSTOMER_ID = c.CUSTOMER_ID

       WHERE o.ORDER_STATUS = ? AND o.TOTAL_AMOUNT > ? AND o.ORDER_DATE >= ?""",

    ["COMPLETED", 5000, "2025-01-01"]

)

df = pl.from_arrow(cursor.fetch_arrow())

```

### Connection Validation

Check if a connection is still valid before use:

```python

# Sync API

conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")

if not conn.is_valid():

    conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")  # Reconnect

# Async API

async with await connect("hdbsql://USER:PASSWORD@HOST:30015") as conn:

    if not await conn.is_valid():

        # Handle invalid connection

        pass

```

The `is_valid(check_connection=True)` method:

- When `check_connection=True` (default): Executes `SELECT 1 FROM DUMMY` to verify connection is alive

- When `check_connection=False`: Only checks internal state (no network round-trip)

### Write Methods

Write DataFrames back to HANA:

```python

# Polars

import pyhdb_rs.polars as hdb

df = pl.DataFrame({"id": [1, 2, 3], "name": ["a", "b", "c"]})

hdb.write_hana(df, "my_table", uri, if_table_exists="replace")

# pandas

import pyhdb_rs.pandas as hdb

df = pd.DataFrame({"id": [1, 2, 3], "name": ["a", "b", "c"]})

hdb.to_hana(df, "my_table", uri, if_exists="append")

```

> [!NOTE]

> Naming difference is intentional: `write_hana()` follows Polars conventions, `to_hana()` follows pandas conventions.

Error Handling

pyhdb-rs provides detailed error messages that include HANA server information for better diagnostics:

```python

import pyhdb_rs

try:

    conn = pyhdb_rs.connect("hdbsql://user:pass@host:30015")

    cursor = conn.cursor()

    cursor.execute("SELECT CUSTOMER_NAME, BALANCE FROM ACCOUNTS WHERE ACCOUNT_TYPE = ?", ["PREMIUM"])

except pyhdb_rs.ProgrammingError as e:

    # Error message includes:

    # - Error code: [259] (HANA error number)

    # - Message: invalid table name

    # - Severity: Error

    # - SQLSTATE: 42000 (SQL standard code)

    # Example: "[259] invalid table name: NONEXISTENT_TABLE (severity: Error), SQLSTATE: 42000"

    print(f"SQL Error: {e}")

except pyhdb_rs.DatabaseError as e:

    print(f"Database error: {e}")

except pyhdb_rs.InterfaceError as e:

    print(f"Connection error: {e}")

```

**Exception hierarchy** (DB-API 2.0 compliant):

- `pyhdb_rs.Error` — Base exception

- `pyhdb_rs.InterfaceError` — Connection or driver issues

- `pyhdb_rs.DatabaseError` — Database server errors

  - `pyhdb_rs.ProgrammingError` — SQL syntax, missing table, wrong column

  - `pyhdb_rs.IntegrityError` — Constraint violations, duplicate keys

  - `pyhdb_rs.DataError` — Type conversion, value overflow

  - `pyhdb_rs.OperationalError` — Connection lost, timeout, server unavailable

  - `pyhdb_rs.NotSupportedError` — Unsupported operation

Connection URL Format Reference

```

hdbsql://[USER[:PASSWORD]@]HOST[:PORT][/DATABASE][?OPTIONS]

```

Examples:

- `hdbsql://user:pass@localhost:30015`

- `hdbsql://user:pass@hana.example.com:39017/HDB`

- `hdbsql://user:pass@host:30015?encrypt=true`

Type mapping

| HANA Type | Python Type | Arrow Type |

|-----------|-------------|------------|

| TINYINT, SMALLINT, INT | `int` | Int8, Int16, Int32 |

| BIGINT | `int` | Int64 |

| DECIMAL | `decimal.Decimal` | Decimal128 |

| REAL, DOUBLE | `float` | Float32, Float64 |

| VARCHAR, NVARCHAR | `str` | Utf8 |

| CLOB, NCLOB | `str` | LargeUtf8 |

| BLOB | `bytes` | LargeBinary |

| DATE | `datetime.date` | Date32 |

| TIME | `datetime.time` | Time64 |

| TIMESTAMP | `datetime.datetime` | Timestamp |

| BOOLEAN | `bool` | Boolean |

Performance

pyhdb-rs is designed for high-performance data access:

- **Zero-copy Arrow**: Data flows directly from HANA to Polars/pandas without intermediate copies

- **Rust core**: All heavy lifting happens in compiled Rust code

- **Connection pooling**: Async pool with configurable size for high-concurrency workloads

- **Batch processing**: Efficient handling of large result sets via streaming

- **Optimized conversions**: Direct BigInt arithmetic for decimals, builder reuse at batch boundaries

- **Type caching**: Thread-local Python type references minimize FFI overhead

Benchmarks show 2x+ performance improvement over hdbcli for bulk reads.

> [!TIP]

> For maximum performance, use `execute_arrow()` with your Arrow-compatible library (Polars, PyArrow, pandas) for zero-copy data transfer.

Arrow Ecosystem Integration

Data is exported in [Apache Arrow](https://arrow.apache.org/) format, enabling zero-copy interoperability with:

- **DataFrames** — Polars, pandas, Vaex, Dask

- **Query engines** — DataFusion, DuckDB, ClickHouse

- **ML/AI** — Ray, Hugging Face Datasets, PyTorch

- **Data lakes** — Delta Lake, Apache Iceberg, Lance

- **Serialization** — Parquet, Arrow IPC (Feather)

For Rust integration examples (DataFusion, DuckDB, Parquet export), see [`hdbconnect-arrow`](crates/hdbconnect-arrow/README.md).

## MSRV policy

> [!NOTE]

> Minimum Supported Rust Version: **1.88**. MSRV increases are minor version bumps.

Examples

Interactive Jupyter notebooks are available in [`examples/notebooks/`](examples/notebooks/):

- **01_quickstart** — Basic connection and DataFrame integration

- **02_polars_analytics** — Advanced Polars analytics with LazyFrames

- **03_streaming_large_data** — Memory-efficient large dataset processing

- **04_performance_comparison** — Benchmarks vs hdbcli

## Repository

- [GitHub](https://github.com/bug-ops/pyhdb-rs)

- [Issue Tracker](https://github.com/bug-ops/pyhdb-rs/issues)

- [Changelog](CHANGELOG.md)

- [API Documentation (Rust)](https://docs.rs/hdbconnect-arrow)

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))

- MIT license ([LICENSE-MIT](LICENSE-MIT))

at your option.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bug-ops/pyhdb-rs

Awesome Lists containing this project

README