Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/flynn3103/loadhouse-toolkit

Loading data into the Lakehouse using JSON configuration and utilities for ETL tasks.
https://github.com/flynn3103/loadhouse-toolkit

delta-lake spark

Last synced: 5 days ago
JSON representation

Loading data into the Lakehouse using JSON configuration and utilities for ETL tasks.

Awesome Lists containing this project

README

        

# Loadhouse

A powerful ETL (Extract, Transform, Load) tool designed for data lakehouse architectures with JSON-based configuration.

## Overview

Loadhouse is a flexible data processing tool that simplifies ETL operations through JSON configuration. It supports various data sources and provides robust data transformation capabilities using Apache Spark.

## Features

- **Configurable Data Sources**
- File-based (CSV, Delta, etc.)
- JDBC connections
- SQL queries
- DataFrame operations

- **Data Transformations**
- Expression filtering
- Custom transformations
- Data quality validation

- **Multiple Output Formats**
- Delta Lake
- File formats (CSV, Parquet, etc.)
- Console output for debugging