An open API service indexing awesome lists of open source software.

https://github.com/databricks/zerobus-sdk

Databricks's Zerobus Ingest SDKs
https://github.com/databricks/zerobus-sdk

databricks databricks-sdk rust zerobus zerobus-sdk

Last synced: about 1 month ago
JSON representation

Databricks's Zerobus Ingest SDKs

Awesome Lists containing this project

README

          

# Zerobus SDKs

Monorepo for Databricks Zerobus Ingest SDKs.

## Disclaimer

[Public Preview](https://docs.databricks.com/release-notes/release-types.html): These SDKs are supported for production use cases and are available to all customers. Databricks is actively working on stabilizing the Zerobus Ingest SDKs. Minor version updates may include backwards-incompatible changes.

We are keen to hear feedback from you. Please [file issues](https://github.com/databricks/zerobus-sdk/issues), and we will address them.

## What is Zerobus?

Zerobus is a high-throughput streaming service for direct data ingestion into Databricks Delta tables, optimized for real-time data pipelines and high-volume workloads.

## SDKs

| Language | Directory | Package |
|----------|-----------|---------|
| Rust | [`rust/`](rust/) | [`databricks-zerobus-ingest-sdk`](https://crates.io/crates/databricks-zerobus-ingest-sdk) |
| Python | [`python/`](python/) | [`databricks-zerobus-ingest-sdk`](https://pypi.org/project/databricks-zerobus-ingest-sdk/) |
| Go | `go/` | *coming soon* |
| TypeScript | [`typescript/`](typescript/) | [`@databricks/zerobus-ingest-sdk`](https://www.npmjs.com/package/@databricks/zerobus-ingest-sdk) |
| Java | [`java/`](java/) | [`com.databricks:zerobus-ingest-sdk`](https://central.sonatype.com/artifact/com.databricks/zerobus-ingest-sdk) |

## Prerequisites

Before using any SDK, you need the following:

### 1. Workspace URL and Workspace ID

After logging into your Databricks workspace, look at the browser URL:

```
https://.cloud.databricks.com/o=
```

- **Workspace URL**: The part before `/o=` (e.g., `https://dbc-a1b2c3d4-e5f6.cloud.databricks.com`)
- **Workspace ID**: The part after `/o=` (e.g., `1234567890123456`)

> **Note:** The examples above show AWS endpoints (`.cloud.databricks.com`). For Azure deployments, the workspace URL will be `https://.azuredatabricks.net`.

### 2. Create a Delta Table

Create a table using Databricks SQL:

```sql
CREATE TABLE .default. (
device_name STRING,
temp INT,
humidity BIGINT
)
USING DELTA;
```

Replace `` with your catalog name (e.g., `main`).

### 3. Create a Service Principal

1. Navigate to **Settings > Identity and Access** in your Databricks workspace
2. Click **Service principals** and create a new service principal
3. Generate a new secret for the service principal and save it securely
4. Grant the following permissions:
- `USE_CATALOG` on the catalog (e.g., `main`)
- `USE_SCHEMA` on the schema (e.g., `default`)
- `MODIFY` and `SELECT` on the table

Grant permissions using SQL:

```sql
-- Grant catalog permission
GRANT USE CATALOG ON CATALOG TO ``;

-- Grant schema permission
GRANT USE SCHEMA ON SCHEMA .default TO ``;

-- Grant table permissions
GRANT SELECT, MODIFY ON TABLE .default. TO ``;
```

The service principal's **Application ID** is your OAuth **Client ID**, and the generated secret is your **Client Secret**.

## Serialization Formats

All SDKs support two serialization formats:

- **JSON** - Simple, schema-free ingestion. Pass a JSON string or native object (dict, map, etc.) and the SDK serializes it. No compilation step required. Good for getting started or dynamic schemas.
- **Protocol Buffers** - Strongly-typed, schema-validated ingestion. More efficient over the wire. Recommended for production workloads.

### Protocol Buffers

Use `proto2` syntax with `optional` fields to correctly represent nullable Delta table columns.

#### Delta → Protobuf Type Mappings

| Delta Type | Proto2 Type |
|-----------|-------------|
| TINYINT, BYTE, INT, SMALLINT, SHORT | int32 |
| BIGINT, LONG | int64 |
| FLOAT | float |
| DOUBLE | double |
| STRING, VARCHAR | string |
| BOOLEAN | bool |
| BINARY | bytes |
| DATE | int32 |
| TIMESTAMP, TIMESTAMP_NTZ | int64 |
| ARRAY\ | repeated type |
| MAP\ | map\ |
| STRUCT\ | nested message |
| VARIANT | string (JSON string) |

### Schema Generation

Instead of writing `.proto` files by hand, each SDK ships a tool to generate protobuf schemas directly from an existing Unity Catalog table. See the individual SDK READMEs for language-specific usage.

## HTTP Proxy Support

All SDKs support HTTP CONNECT proxies via environment variables, following gRPC core conventions. The first variable found (in order) is used:

| Proxy | No-proxy |
|-------|----------|
| `grpc_proxy` / `GRPC_PROXY` | `no_grpc_proxy` / `NO_GRPC_PROXY` |
| `https_proxy` / `HTTPS_PROXY` | `no_proxy` / `NO_PROXY` |
| `http_proxy` / `HTTP_PROXY` | |

The `no_proxy` value is a comma-separated list of hostnames (suffix-matched) or `*` to bypass the proxy entirely.

```bash
export https_proxy=http://my-proxy:8080
export no_proxy=localhost,127.0.0.1
```

The SDK establishes a plaintext HTTP CONNECT tunnel through the proxy, then performs a TLS handshake end-to-end with the Databricks server. The proxy never sees decrypted traffic.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Each SDK also has its own contributing guide with language-specific setup instructions.

## License

This project is licensed under the Databricks License. See [LICENSE](LICENSE) for the full text.