An open API service indexing awesome lists of open source software.

https://github.com/bug-ops/pyhdb-rs

Rust-powered SAP HANA driver for Python with native Apache Arrow support. Zero-copy data transfer to Polars & Pandas. DB-API 2.0, async/await, connection pooling
https://github.com/bug-ops/pyhdb-rs

arrow async database dbapi driver hana pandas performance polars pyo3 python rust sap sap-hana zero-copy

Last synced: 14 days ago
JSON representation

Rust-powered SAP HANA driver for Python with native Apache Arrow support. Zero-copy data transfer to Polars & Pandas. DB-API 2.0, async/await, connection pooling

Awesome Lists containing this project

README

          

# pyhdb-rs

High-performance Python driver for SAP HANA with native Arrow support.

[![CI](https://github.com/bug-ops/pyhdb-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/bug-ops/pyhdb-rs/actions/workflows/ci.yml)
[![Security](https://github.com/bug-ops/pyhdb-rs/actions/workflows/security.yml/badge.svg)](https://github.com/bug-ops/pyhdb-rs/actions/workflows/security.yml)
[![codecov](https://codecov.io/gh/bug-ops/pyhdb-rs/graph/badge.svg?token=75RR61N6FI)](https://codecov.io/gh/bug-ops/pyhdb-rs)
[![Crates.io](https://img.shields.io/crates/v/hdbconnect-arrow.svg)](https://crates.io/crates/hdbconnect-arrow)
[![docs.rs](https://img.shields.io/docsrs/hdbconnect-arrow)](https://docs.rs/hdbconnect-arrow)
[![PyPI](https://img.shields.io/pypi/v/pyhdb_rs.svg)](https://pypi.org/project/pyhdb_rs/)
[![Python](https://img.shields.io/pypi/pyversions/pyhdb_rs)](https://pypi.org/project/pyhdb_rs)
[![MSRV](https://img.shields.io/badge/MSRV-1.88-blue)](https://github.com/bug-ops/pyhdb-rs)
[![License](https://img.shields.io/badge/license-Apache--2.0%20OR%20MIT-blue.svg)](LICENSE-APACHE)

## Features

- Full DB-API 2.0 (PEP 249) compliance
- Zero-copy Arrow data transfer via PyCapsule Interface
- Native Polars/pandas integration
- Async/await support with connection pooling
- Built with Rust and PyO3 for maximum performance

## Installation

```bash
uv pip install pyhdb_rs
```

With optional dependencies:

```bash
uv pip install pyhdb_rs[polars] # Polars integration
uv pip install pyhdb_rs[pandas] # pandas + PyArrow
uv pip install pyhdb_rs[async] # Async support
uv pip install pyhdb_rs[all] # All integrations
```

> [!IMPORTANT]
> Requires Python 3.12 or later.

Platform support

| Platform | Architectures |
|----------|---------------|
| Linux (glibc) | x86_64, aarch64 |
| Linux (musl) | x86_64, aarch64 |
| macOS | x86_64, aarch64 |
| Windows | x86_64 |

From source

```bash
git clone https://github.com/bug-ops/pyhdb-rs.git
cd pyhdb-rs/python

uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

uv pip install maturin
maturin develop --release
```

## Quick start

### DB-API 2.0 usage

```python
import pyhdb_rs

conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")

with conn.cursor() as cursor:
cursor.execute("SELECT * FROM CUSTOMERS WHERE IS_ACTIVE = ?", [True])

rows = cursor.fetchall()
for row in rows:
print(row)

cursor.execute("SELECT CUSTOMER_NAME, EMAIL_ADDRESS FROM CUSTOMERS")
for name, email in cursor:
print(f"{name}: {email}")

conn.close()
```

## Builder API

pyhdb-rs provides a builder pattern for flexible connection configuration.

### Basic connection

```python
from pyhdb_rs import ConnectionBuilder

conn = (ConnectionBuilder()
.host("hana.example.com")
.port(30015)
.credentials("SYSTEM", "password")
.database("SYSTEMDB")
.build())

with conn.cursor() as cursor:
cursor.execute("SELECT * FROM DUMMY")
print(cursor.fetchone())

conn.close()
```

### With TLS configuration

```python
from pyhdb_rs import ConnectionBuilder, TlsConfig

# System root certificates
tls = TlsConfig.with_system_roots()

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

> [!TIP]
> Use `TlsConfig.from_directory("/path/to/certs")` for custom CA certificates.

Builder API: From URL with overrides

```python
from pyhdb_rs import ConnectionBuilder, TlsConfig

# Start with URL, override specific settings
conn = (ConnectionBuilder.from_url("hdbsql://user:pass@host:30015")
.tls(TlsConfig.with_system_roots())
.autocommit(True)
.build())
```

Builder API: Async connections

```python
import asyncio
from pyhdb_rs.aio import AsyncConnectionBuilder
from pyhdb_rs import TlsConfig

async def main():
conn = await (AsyncConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(TlsConfig.with_system_roots())
.autocommit(True)
.build())

async with conn:
cursor = conn.cursor()
await cursor.execute("SELECT * FROM DUMMY")
print(await cursor.fetchone())

asyncio.run(main())
```

Polars Integration Examples

```python
import pyhdb_rs.polars as hdb

df = hdb.read_hana(
"""
SELECT
PRODUCT_CATEGORY,
FISCAL_YEAR,
SUM(NET_AMOUNT) AS TOTAL_REVENUE,
COUNT(DISTINCT ORDER_ID) AS ORDER_COUNT,
AVG(QUANTITY) AS AVG_QUANTITY
FROM SALES_ITEMS
WHERE FISCAL_YEAR BETWEEN 2024 AND 2026
AND SALES_REGION IN ('EMEA', 'AMERICAS')
GROUP BY PRODUCT_CATEGORY, FISCAL_YEAR
ORDER BY TOTAL_REVENUE DESC
""",
"hdbsql://USER:PASSWORD@HOST:39017"
)

print(df.head())
```

> [!TIP]
> Use `execute_arrow()` with Polars for best performance. Data flows directly from HANA to Polars without intermediate copies.

Or using the connection object:

```python
import pyhdb_rs
import polars as pl

conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")

# Get as Polars DataFrame (zero-copy via Arrow)
reader = conn.execute_arrow(
"SELECT PRODUCT_ID, PRODUCT_NAME, CATEGORY, UNIT_PRICE FROM PRODUCTS WHERE IS_ACTIVE = 1"
)
df = pl.from_arrow(reader)

# For parameterized queries, use two-step pattern
cursor = conn.cursor()
cursor.execute(
"""SELECT p.PRODUCT_NAME, p.UNIT_PRICE, s.STOCK_QUANTITY
FROM PRODUCTS p
JOIN STOCK s ON p.PRODUCT_ID = s.PRODUCT_ID
WHERE p.CATEGORY = ? AND s.STOCK_QUANTITY > ?""",
["Electronics", 10]
)
df = pl.from_arrow(cursor.fetch_arrow())

# Stream large datasets batch-by-batch
reader = conn.execute_arrow(
"SELECT ORDER_ID, CUSTOMER_ID, ORDER_DATE, TOTAL_AMOUNT FROM SALES_ORDERS WHERE ORDER_DATE >= '2024-01-01'"
)
for batch in reader:
process_batch(batch)

conn.close()
```

### pandas integration

```python
import pyhdb_rs.pandas as hdb

df = hdb.read_hana(
"""SELECT ORDER_ID, CUSTOMER_NAME, PRODUCT_NAME, QUANTITY, NET_AMOUNT
FROM SALES_ITEMS
WHERE ORDER_STATUS = 'COMPLETED' AND ORDER_DATE >= ADD_MONTHS(CURRENT_DATE, -12)""",
"hdbsql://USER:PASSWORD@HOST:39017"
)

print(df.head())
```

### Lazy evaluation with Polars

```python
import pyhdb_rs.polars as hdb
import polars as pl

# scan_hana() returns a LazyFrame - query executes on .collect()
lf = hdb.scan_hana(
"SELECT ORDER_ID, CUSTOMER_NAME, PRODUCT_CATEGORY, NET_AMOUNT, ORDER_DATE FROM SALES_ITEMS WHERE YEAR(ORDER_DATE) = 2025",
"hdbsql://USER:PASSWORD@HOST:39017"
)
result = lf.filter(pl.col("NET_AMOUNT") > 1000).select(["CUSTOMER_NAME", "PRODUCT_CATEGORY", "NET_AMOUNT"]).collect()
```

> [!TIP]
> Use `scan_hana()` for lazy evaluation when you need to apply filters or transformations before materializing data.

TLS/SSL Configuration (5 methods)

pyhdb-rs provides flexible TLS configuration via `TlsConfig` for secure connections.

### TLS Configuration Methods

`TlsConfig` provides five factory methods for different certificate sources:

#### 1. From Directory (recommended for production)

Load all `.pem`, `.crt`, and `.cer` files from a directory:

```python
from pyhdb_rs import TlsConfig, ConnectionBuilder

tls = TlsConfig.from_directory("/etc/hana/certs")

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

> [!TIP]
> This is the recommended approach for production deployments. Place all CA certificates in a single directory.

#### 2. From Environment Variable

Load certificate from an environment variable:

```python
import os
from pyhdb_rs import TlsConfig, ConnectionBuilder

# Set certificate in environment
os.environ["HANA_CA_CERT"] = """-----BEGIN CERTIFICATE-----
MIIDdzCCAl+gAwIBAgIEAgAAuTANBgkqhkiG9w0BAQUFADBaMQswCQYDVQQGEwJJ
...
-----END CERTIFICATE-----"""

tls = TlsConfig.from_environment("HANA_CA_CERT")

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

> [!NOTE]
> Useful for containerized deployments where certificates are injected via environment variables.

#### 3. From Certificate String

Provide PEM-encoded certificate directly:

```python
from pyhdb_rs import TlsConfig, ConnectionBuilder

with open("/path/to/ca-bundle.pem") as f:
cert_pem = f.read()

tls = TlsConfig.from_certificate(cert_pem)

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

#### 4. System Root Certificates

Use Mozilla's root certificates (bundled):

```python
from pyhdb_rs import TlsConfig, ConnectionBuilder

tls = TlsConfig.with_system_roots()

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

> [!TIP]
> Best choice when your HANA server uses a certificate signed by a well-known CA (e.g., Let's Encrypt, DigiCert).

#### 5. Insecure (development only)

Skip certificate verification:

```python
from pyhdb_rs import TlsConfig, ConnectionBuilder

tls = TlsConfig.insecure()

conn = (ConnectionBuilder()
.host("hana-dev.internal")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

> [!CAUTION]
> `TlsConfig.insecure()` disables server certificate verification completely. **NEVER use in production.** This makes your connection vulnerable to man-in-the-middle attacks.

### URL Scheme for TLS

The `hdbsqls://` scheme automatically enables TLS with system roots:

```python
from pyhdb_rs import ConnectionBuilder

# Equivalent to using TlsConfig.with_system_roots()
conn = ConnectionBuilder.from_url("hdbsqls://user:pass@host:30015").build()

# Override with custom TLS config
conn = (ConnectionBuilder.from_url("hdbsqls://user:pass@host:30015")
.tls(TlsConfig.from_directory("/custom/certs"))
.build())
```

Cursor Holdability (Transaction Control)

Control result set behavior across transaction boundaries with `CursorHoldability`. This determines whether cursors remain open after `commit()` or `rollback()` operations.

```python
from pyhdb_rs import ConnectionBuilder, CursorHoldability

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.cursor_holdability(CursorHoldability.CommitAndRollback)
.build())

conn.set_autocommit(False)
with conn.cursor() as cur:
cur.execute("SELECT * FROM large_table")
rows = cur.fetchmany(1000)

# Process first batch
process_batch(rows)

conn.commit() # Cursor remains open with CommitAndRollback

# Continue reading from the same result set
more_rows = cur.fetchmany(1000)
```

### Holdability Variants

| Variant | Behavior |
|---------|----------|
| `CursorHoldability.None` | Cursor closed on commit **and** rollback (default) |
| `CursorHoldability.Commit` | Cursor held across commits, closed on rollback |
| `CursorHoldability.Rollback` | Cursor held across rollbacks, closed on commit |
| `CursorHoldability.CommitAndRollback` | Cursor held across both operations |

> [!NOTE]
> Use `CommitAndRollback` when you need to iterate over large result sets while performing intermediate commits to free locks or manage transaction size.

High Availability & Scale-Out Deployments

Configure network groups for HANA HA and Scale-Out deployments to control connection routing.

### Network Group Configuration

```python
from pyhdb_rs import ConnectionBuilder

conn = (ConnectionBuilder()
.host("hana-ha-cluster.example.com")
.port(30015)
.credentials("SYSTEM", "password")
.network_group("ha-primary")
.build())
```

> [!IMPORTANT]
> Network groups are essential for proper routing in multi-node HANA environments. They determine which network interface the driver uses when multiple options are available.

### Use Cases

**1. High Availability Clusters**

Direct connections to specific nodes in an HA setup:

```python
# Connect to primary node network
conn_primary = (ConnectionBuilder()
.host("hana-ha.example.com")
.credentials("SYSTEM", "password")
.network_group("internal")
.build())

# Connect to secondary node network
conn_secondary = (ConnectionBuilder()
.host("hana-ha.example.com")
.credentials("SYSTEM", "password")
.network_group("external")
.build())
```

**2. Scale-Out Systems**

Route to specific network groups in scale-out configurations:

```python
from pyhdb_rs import ConnectionBuilder

# Connect via data network
conn = (ConnectionBuilder()
.host("hana-scaleout.example.com")
.credentials("SYSTEM", "password")
.network_group("data-network")
.build())
```

### Async Connection Pools

Combine network groups with connection pooling for production deployments:

```python
from pyhdb_rs.aio import ConnectionPoolBuilder
from pyhdb_rs import TlsConfig

pool = (ConnectionPoolBuilder()
.url("hdbsql://user:pass@host:30015")
.network_group("production")
.max_size(20)
.tls(TlsConfig.with_system_roots())
.build())

async with pool.acquire() as conn:
reader = await conn.execute_arrow("SELECT * FROM large_table")
df = pl.from_arrow(reader)
```

Async/Await Support

pyhdb-rs supports async/await operations for non-blocking database access.

> [!NOTE]
> Async support requires the `async` extra: `uv pip install pyhdb_rs[async]`

> [!WARNING]
> **Async API Memory Behavior**: The async `execute_arrow()` loads ALL rows into
> memory before streaming batches. For large datasets (>100K rows), use the sync
> API for true streaming with O(batch_size) memory usage.

### Basic async usage

```python
import asyncio
import polars as pl
from pyhdb_rs.aio import connect

async def main():
async with await connect("hdbsql://USER:PASSWORD@HOST:30015") as conn:
reader = await conn.execute_arrow(
"""SELECT PRODUCT_NAME, SUM(QUANTITY) AS TOTAL_SOLD, SUM(NET_AMOUNT) AS REVENUE
FROM SALES_ITEMS
WHERE ORDER_DATE >= '2025-01-01'
GROUP BY PRODUCT_NAME
ORDER BY REVENUE DESC
LIMIT 10"""
)
df = pl.from_arrow(reader)
print(df)

asyncio.run(main())
```

Connection pooling

### Using ConnectionPoolBuilder (recommended)

```python
import asyncio
import polars as pl
from pyhdb_rs.aio import ConnectionPoolBuilder
from pyhdb_rs import TlsConfig

async def main():
# Builder pattern for pools
pool = (ConnectionPoolBuilder()
.url("hdbsql://USER:PASSWORD@HOST:30015")
.max_size(10)
.tls(TlsConfig.with_system_roots())
.network_group("production")
.build())

async with pool.acquire() as conn:
reader = await conn.execute_arrow(
"""SELECT CUSTOMER_ID, COUNT(ORDER_ID) AS ORDER_COUNT, SUM(TOTAL_AMOUNT) AS TOTAL_SPENT
FROM SALES_ORDERS
WHERE ORDER_DATE >= '2025-01-01' AND ORDER_STATUS = 'COMPLETED'
GROUP BY CUSTOMER_ID
HAVING SUM(TOTAL_AMOUNT) > 10000"""
)
df = pl.from_arrow(reader)
print(df)

status = pool.status
print(f"Pool size: {status.size}, available: {status.available}")

asyncio.run(main())
```

### Using create_pool (legacy)

```python
import asyncio
from pyhdb_rs.aio import create_pool

async def main():
pool = create_pool(
"hdbsql://USER:PASSWORD@HOST:30015",
max_size=10,
connection_timeout=30
)

async with pool.acquire() as conn:
# Use connection
pass

asyncio.run(main())
```

Concurrent queries

```python
import asyncio
import polars as pl
from pyhdb_rs.aio import create_pool

async def fetch_sales_by_region(pool, region: str):
async with pool.acquire() as conn:
reader = await conn.execute_arrow(
f"""SELECT PRODUCT_CATEGORY, SUM(NET_AMOUNT) AS REVENUE
FROM SALES_ITEMS
WHERE REGION = '{region}' AND FISCAL_YEAR = 2025
GROUP BY PRODUCT_CATEGORY
ORDER BY REVENUE DESC"""
)
return pl.from_arrow(reader)

async def main():
pool = create_pool("hdbsql://USER:PASSWORD@HOST:30015", max_size=5)

# Run multiple queries concurrently for different regions
results = await asyncio.gather(
fetch_sales_by_region(pool, "EMEA"),
fetch_sales_by_region(pool, "AMERICAS"),
fetch_sales_by_region(pool, "APAC"),
)

emea_df, americas_df, apac_df = results
print(f"EMEA: {len(emea_df)} categories, AMERICAS: {len(americas_df)} categories")

asyncio.run(main())
```

Migration Guide: v0.2.x → v0.3.0

### Breaking Changes

#### 1. Removed `statement_cache_size` parameter

The `statement_cache_size` parameter has been removed from the async `connect()` function. Statement cache size is now fixed at 100 (the default).

**Before (v0.2.x):**
```python
from pyhdb_rs.aio import connect

conn = await connect("hdbsql://user:pass@host:30015", statement_cache_size=100)
```

**After (v0.3.0):**
```python
from pyhdb_rs.aio import connect

# statement_cache_size is always 100
conn = await connect("hdbsql://user:pass@host:30015")
```

> [!NOTE]
> This change only affects async connections. Sync connections never had this parameter.

### New Features

#### Builder API (Recommended)

While the old connection methods still work, the new builder API provides more flexibility:

**Old style (still supported):**
```python
import pyhdb_rs

conn = pyhdb_rs.connect("hdbsql://user:pass@host:30015")
```

**New style (recommended):**
```python
from pyhdb_rs import ConnectionBuilder, TlsConfig

conn = (ConnectionBuilder()
.host("host")
.port(30015)
.credentials("user", "pass")
.tls(TlsConfig.with_system_roots())
.build())
```

**Benefits of the builder pattern:**
- Type-safe configuration
- More discoverable API
- Better IDE autocomplete
- Fine-grained control over TLS, cursor holdability, network groups

#### TlsConfig

v0.3.0 introduces `TlsConfig` for flexible TLS configuration:

```python
from pyhdb_rs import ConnectionBuilder, TlsConfig

# Multiple ways to configure TLS
tls = TlsConfig.from_directory("/etc/hana/certs")
tls = TlsConfig.from_environment("HANA_CA_CERT")
tls = TlsConfig.from_certificate(cert_pem)
tls = TlsConfig.with_system_roots()
tls = TlsConfig.insecure() # Development only!

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.tls(tls)
.build())
```

#### CursorHoldability

Control cursor behavior across transactions:

```python
from pyhdb_rs import ConnectionBuilder, CursorHoldability

conn = (ConnectionBuilder()
.host("hana.example.com")
.credentials("SYSTEM", "password")
.cursor_holdability(CursorHoldability.CommitAndRollback)
.build())
```

#### Network Groups

For HA and Scale-Out deployments:

```python
from pyhdb_rs import ConnectionBuilder

conn = (ConnectionBuilder()
.host("hana-ha.example.com")
.credentials("SYSTEM", "password")
.network_group("production")
.build())
```

#### ConnectionPoolBuilder

The async pool API now has a builder:

**Old style (still supported):**
```python
from pyhdb_rs.aio import create_pool

pool = create_pool("hdbsql://user:pass@host:30015", max_size=10)
```

**New style (recommended):**
```python
from pyhdb_rs.aio import ConnectionPoolBuilder
from pyhdb_rs import TlsConfig

pool = (ConnectionPoolBuilder()
.url("hdbsql://user:pass@host:30015")
.max_size(10)
.tls(TlsConfig.with_system_roots())
.network_group("production")
.build())
```

### Upgrade Checklist

- [ ] Remove `statement_cache_size` from async `connect()` calls
- [ ] Consider migrating to `ConnectionBuilder` for better configuration
- [ ] Use `TlsConfig` for explicit TLS configuration
- [ ] Add `network_group` if using HANA HA/Scale-Out
- [ ] Use `ConnectionPoolBuilder` for new pool configurations

API Patterns & Best Practices

### Arrow RecordBatchReader

`execute_arrow()` returns a `RecordBatchReader` that implements the Arrow PyCapsule Interface (`__arrow_c_stream__`):

```python
# Pattern 1: Direct conversion to Polars (recommended)
reader = conn.execute_arrow(
"SELECT CUSTOMER_ID, CUSTOMER_NAME, TOTAL_ORDERS FROM CUSTOMER_SUMMARY WHERE ACTIVE_FLAG = 1"
)
df = pl.from_arrow(reader) # Zero-copy via PyCapsule

# Pattern 2: Convert to PyArrow Table first
reader = conn.execute_arrow(
"SELECT ORDER_ID, ORDER_DATE, TOTAL_AMOUNT FROM SALES_ORDERS WHERE ORDER_DATE >= '2025-01-01'"
)
pa_reader = pa.RecordBatchReader.from_stream(reader)
table = pa_reader.read_all()

# Pattern 3: Stream large datasets
reader = conn.execute_arrow(
"SELECT TRANSACTION_ID, CUSTOMER_ID, AMOUNT, TRANSACTION_DATE FROM TRANSACTION_HISTORY WHERE YEAR(TRANSACTION_DATE) = 2025",
batch_size=10000
)
for batch in reader:
process_batch(batch) # Each batch is a RecordBatch
```

> [!NOTE]
> The reader is consumed after use (single-pass iterator). You cannot read from it twice.

### Parameterized Queries with Arrow

`execute_arrow()` does NOT support query parameters. For parameterized queries, use the two-step pattern:

```python
# Two-step: execute() then fetch_arrow()
cursor = conn.cursor()
cursor.execute(
"""SELECT o.ORDER_ID, o.ORDER_DATE, c.CUSTOMER_NAME, o.TOTAL_AMOUNT
FROM SALES_ORDERS o
JOIN CUSTOMERS c ON o.CUSTOMER_ID = c.CUSTOMER_ID
WHERE o.ORDER_STATUS = ? AND o.TOTAL_AMOUNT > ? AND o.ORDER_DATE >= ?""",
["COMPLETED", 5000, "2025-01-01"]
)
df = pl.from_arrow(cursor.fetch_arrow())
```

### Connection Validation

Check if a connection is still valid before use:

```python
# Sync API
conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015")
if not conn.is_valid():
conn = pyhdb_rs.connect("hdbsql://USER:PASSWORD@HOST:30015") # Reconnect

# Async API
async with await connect("hdbsql://USER:PASSWORD@HOST:30015") as conn:
if not await conn.is_valid():
# Handle invalid connection
pass
```

The `is_valid(check_connection=True)` method:
- When `check_connection=True` (default): Executes `SELECT 1 FROM DUMMY` to verify connection is alive
- When `check_connection=False`: Only checks internal state (no network round-trip)

### Write Methods

Write DataFrames back to HANA:

```python
# Polars
import pyhdb_rs.polars as hdb
df = pl.DataFrame({"id": [1, 2, 3], "name": ["a", "b", "c"]})
hdb.write_hana(df, "my_table", uri, if_table_exists="replace")

# pandas
import pyhdb_rs.pandas as hdb
df = pd.DataFrame({"id": [1, 2, 3], "name": ["a", "b", "c"]})
hdb.to_hana(df, "my_table", uri, if_exists="append")
```

> [!NOTE]
> Naming difference is intentional: `write_hana()` follows Polars conventions, `to_hana()` follows pandas conventions.

Error Handling

pyhdb-rs provides detailed error messages that include HANA server information for better diagnostics:

```python
import pyhdb_rs

try:
conn = pyhdb_rs.connect("hdbsql://user:pass@host:30015")
cursor = conn.cursor()
cursor.execute("SELECT CUSTOMER_NAME, BALANCE FROM ACCOUNTS WHERE ACCOUNT_TYPE = ?", ["PREMIUM"])
except pyhdb_rs.ProgrammingError as e:
# Error message includes:
# - Error code: [259] (HANA error number)
# - Message: invalid table name
# - Severity: Error
# - SQLSTATE: 42000 (SQL standard code)
# Example: "[259] invalid table name: NONEXISTENT_TABLE (severity: Error), SQLSTATE: 42000"
print(f"SQL Error: {e}")
except pyhdb_rs.DatabaseError as e:
print(f"Database error: {e}")
except pyhdb_rs.InterfaceError as e:
print(f"Connection error: {e}")
```

**Exception hierarchy** (DB-API 2.0 compliant):

- `pyhdb_rs.Error` — Base exception
- `pyhdb_rs.InterfaceError` — Connection or driver issues
- `pyhdb_rs.DatabaseError` — Database server errors
- `pyhdb_rs.ProgrammingError` — SQL syntax, missing table, wrong column
- `pyhdb_rs.IntegrityError` — Constraint violations, duplicate keys
- `pyhdb_rs.DataError` — Type conversion, value overflow
- `pyhdb_rs.OperationalError` — Connection lost, timeout, server unavailable
- `pyhdb_rs.NotSupportedError` — Unsupported operation

Connection URL Format Reference

```
hdbsql://[USER[:PASSWORD]@]HOST[:PORT][/DATABASE][?OPTIONS]
```

Examples:
- `hdbsql://user:pass@localhost:30015`
- `hdbsql://user:pass@hana.example.com:39017/HDB`
- `hdbsql://user:pass@host:30015?encrypt=true`

Type mapping

| HANA Type | Python Type | Arrow Type |
|-----------|-------------|------------|
| TINYINT, SMALLINT, INT | `int` | Int8, Int16, Int32 |
| BIGINT | `int` | Int64 |
| DECIMAL | `decimal.Decimal` | Decimal128 |
| REAL, DOUBLE | `float` | Float32, Float64 |
| VARCHAR, NVARCHAR | `str` | Utf8 |
| CLOB, NCLOB | `str` | LargeUtf8 |
| BLOB | `bytes` | LargeBinary |
| DATE | `datetime.date` | Date32 |
| TIME | `datetime.time` | Time64 |
| TIMESTAMP | `datetime.datetime` | Timestamp |
| BOOLEAN | `bool` | Boolean |

Performance

pyhdb-rs is designed for high-performance data access:

- **Zero-copy Arrow**: Data flows directly from HANA to Polars/pandas without intermediate copies
- **Rust core**: All heavy lifting happens in compiled Rust code
- **Connection pooling**: Async pool with configurable size for high-concurrency workloads
- **Batch processing**: Efficient handling of large result sets via streaming
- **Optimized conversions**: Direct BigInt arithmetic for decimals, builder reuse at batch boundaries
- **Type caching**: Thread-local Python type references minimize FFI overhead

Benchmarks show 2x+ performance improvement over hdbcli for bulk reads.

> [!TIP]
> For maximum performance, use `execute_arrow()` with your Arrow-compatible library (Polars, PyArrow, pandas) for zero-copy data transfer.

Arrow Ecosystem Integration

Data is exported in [Apache Arrow](https://arrow.apache.org/) format, enabling zero-copy interoperability with:

- **DataFrames** — Polars, pandas, Vaex, Dask
- **Query engines** — DataFusion, DuckDB, ClickHouse
- **ML/AI** — Ray, Hugging Face Datasets, PyTorch
- **Data lakes** — Delta Lake, Apache Iceberg, Lance
- **Serialization** — Parquet, Arrow IPC (Feather)

For Rust integration examples (DataFusion, DuckDB, Parquet export), see [`hdbconnect-arrow`](crates/hdbconnect-arrow/README.md).

## MSRV policy

> [!NOTE]
> Minimum Supported Rust Version: **1.88**. MSRV increases are minor version bumps.

Examples

Interactive Jupyter notebooks are available in [`examples/notebooks/`](examples/notebooks/):

- **01_quickstart** — Basic connection and DataFrame integration
- **02_polars_analytics** — Advanced Polars analytics with LazyFrames
- **03_streaming_large_data** — Memory-efficient large dataset processing
- **04_performance_comparison** — Benchmarks vs hdbcli

## Repository

- [GitHub](https://github.com/bug-ops/pyhdb-rs)
- [Issue Tracker](https://github.com/bug-ops/pyhdb-rs/issues)
- [Changelog](CHANGELOG.md)
- [API Documentation (Rust)](https://docs.rs/hdbconnect-arrow)

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT license ([LICENSE-MIT](LICENSE-MIT))

at your option.