An open API service indexing awesome lists of open source software.

https://github.com/fabricks-framework/fabricks


https://github.com/fabricks-framework/fabricks

config-driven-etl databricks datawarehouse delta-lake etl etl-framework framework lakehouse pyspark sql

Last synced: 19 days ago
JSON representation

Awesome Lists containing this project

README

          

# Welcome to Fabricks ๐Ÿ—๏ธ๐Ÿงฑ
## Framework for Databricks

[![PyPI version](https://badge.fury.io/py/fabricks.svg)](https://pypi.org/project/fabricks/)

Fabricks is a Python framework developed to help create a Lakehouse in **Databricks**. It simplifies the process of building and maintaining data pipelines by providing a standardized approach to defining and managing data processing workflows. Fabricks is battle-proven, used in production environments running thousands of jobs. ๐Ÿ’ช๐Ÿš€

Currently, Fabricks is based on Azure **Databricks** and runs on Azure, utilizing Azure Blob Storage, Azure Table Storage, and Azure Queue Storage. Porting it to AWS or Google Cloud should not be a significant challenge. โ˜๏ธ๐Ÿ”„

Although Fabricks is primarily designed to run on **Databricks**, the code using Fabricks is highly portable. You'll predominantly write SQL-Select code, eliminating the need to manually write DDL/DML/Merge queries. In the future, we may add support for other platforms such as DuckDB or Open Source Spark. ๐Ÿ๐Ÿ“Š

## Use Cases ๐Ÿ› ๏ธ
- Data Ingestion using Python Notebooks, Jupyter-style
- ETL using SQL-queries (should cover 99% of cases) or Notebooks
- Data Distribution using Python Notebooks

No need for magic here. It's all your Data Lakehouse/Data Warehouse code in one place. Simple and great! โœจ You don't need expensive Delta Live Tables, ETL Tools, or DBT. It's basically just writing SQL Queries and letting Fabricks do the magic ๐Ÿง™โ€โ™‚๏ธ.

## About this repo ๐Ÿ•ต๏ธโ€โ™‚๏ธ
We're just getting started with open-sourcing Fabricks! There are many areas where we want to improve:
- Implement testing in GitHub Actions ๐Ÿงช๐Ÿ‘จโ€๐Ÿ’ป
- Decouple Spark dependencies where possible โšก๐Ÿ”“
- Migrate YAML parsing to Pydantic ๐Ÿ“„๐Ÿ”„
- Enhance documentation with more examples and best practices ๐Ÿ“š๐Ÿ’ก
- Develop a comprehensive getting started guide ๐Ÿš€๐Ÿ“˜
- Create a contribution guide for the open-source community ๐Ÿค๐ŸŒ

## More Information โ„น๏ธ
See [Fabricks Documentation](https://fabricks-framework.github.io/fabricks/)

### Release Notes

For the latest releases and detailed changelogs, please visit the [Fabricks Releases page on GitHub](https://github.com/fabricks-framework/fabricks/releases).

### Runtime Requirements

[โœ”] `Fabricks 4.0.0` was successfully tested on Databricks Runtime `16.4 LTS`.

[โœ”] `Fabricks 4.0.10` was successfully tested on Databricks Runtime `17.3 LTS`.

## Related Projects ๐Ÿ”—
- We use [odbc2deltalake](https://github.com/bmsuisse/odbc2deltalake) for extensive SQL Server data ingestion in a pre_run notebook. ๐Ÿ”Œ๐ŸŠโ€โ™‚๏ธ