Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/flynn3103/loadhouse-toolkit
Loading data into the Lakehouse using JSON configuration and utilities for ETL tasks.
https://github.com/flynn3103/loadhouse-toolkit
delta-lake spark
Last synced: 5 days ago
JSON representation
Loading data into the Lakehouse using JSON configuration and utilities for ETL tasks.
- Host: GitHub
- URL: https://github.com/flynn3103/loadhouse-toolkit
- Owner: flynn3103
- Created: 2024-12-31T07:15:09.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-02-13T16:10:06.000Z (6 days ago)
- Last Synced: 2025-02-13T17:24:29.275Z (6 days ago)
- Topics: delta-lake, spark
- Language: Python
- Homepage:
- Size: 84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Loadhouse
A powerful ETL (Extract, Transform, Load) tool designed for data lakehouse architectures with JSON-based configuration.
## Overview
Loadhouse is a flexible data processing tool that simplifies ETL operations through JSON configuration. It supports various data sources and provides robust data transformation capabilities using Apache Spark.
## Features
- **Configurable Data Sources**
- File-based (CSV, Delta, etc.)
- JDBC connections
- SQL queries
- DataFrame operations- **Data Transformations**
- Expression filtering
- Custom transformations
- Data quality validation- **Multiple Output Formats**
- Delta Lake
- File formats (CSV, Parquet, etc.)
- Console output for debugging