https://github.com/splitgraph/lakehouse-loader
CLI utility to load data into Delta Lake and other lakehouse formats
https://github.com/splitgraph/lakehouse-loader
Last synced: 8 months ago
JSON representation
CLI utility to load data into Delta Lake and other lakehouse formats
- Host: GitHub
- URL: https://github.com/splitgraph/lakehouse-loader
- Owner: splitgraph
- Created: 2024-05-22T07:33:11.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-16T08:46:53.000Z (about 1 year ago)
- Last Synced: 2025-05-08T22:43:33.366Z (8 months ago)
- Language: Rust
- Size: 210 KB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Lakehouse Loader
Load data from Parquet and Postgres to Delta Lake
## Features
- Supports S3 and file output
- Supports larger-than-memory source data
## Usage
Download the binary from the [Releases page](./releases)
To load data from Postgres to Delta Lake:
```bash
export PGPASSWORD="my_password"
./lakehouse-loader pg-to-delta postgres://test-user@localhost:5432/test-db -q "SELECT * FROM some_table" s3://my-bucket/path/to/table
```
To load data from Parquet to Delta Lake:
```bash
./lakehouse-loader parquet-to-delta some_file.parquet s3://my-bucket/path/to/table
```
Supports standard AWS environment variables (e.g. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_PROFILE, AWS_ENDPOINT etc).
Use the `file://` protocol to load data into a file instead.
## Limitations
- Supported datatypes: bool, char, int2, int4, int8, float4, float8, timestamp, timestamptz, date, text, bytea. Cast the columns in your query to `text` or another supported type if your query returns different types
- Doesn't support appending to tables, only writing new Delta Tables (pass `-o` to overwrite)