https://github.com/fabricks-framework/fabricks
https://github.com/fabricks-framework/fabricks
config-driven-etl databricks datawarehouse delta-lake etl etl-framework framework lakehouse pyspark sql
Last synced: 19 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/fabricks-framework/fabricks
- Owner: fabricks-framework
- License: mit
- Created: 2024-07-24T13:08:04.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-03T10:23:17.000Z (26 days ago)
- Last Synced: 2026-03-03T11:13:56.162Z (26 days ago)
- Topics: config-driven-etl, databricks, datawarehouse, delta-lake, etl, etl-framework, framework, lakehouse, pyspark, sql
- Language: Python
- Homepage:
- Size: 3.97 MB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Welcome to Fabricks ๐๏ธ๐งฑ
## Framework for Databricks
[](https://pypi.org/project/fabricks/)
Fabricks is a Python framework developed to help create a Lakehouse in **Databricks**. It simplifies the process of building and maintaining data pipelines by providing a standardized approach to defining and managing data processing workflows. Fabricks is battle-proven, used in production environments running thousands of jobs. ๐ช๐
Currently, Fabricks is based on Azure **Databricks** and runs on Azure, utilizing Azure Blob Storage, Azure Table Storage, and Azure Queue Storage. Porting it to AWS or Google Cloud should not be a significant challenge. โ๏ธ๐
Although Fabricks is primarily designed to run on **Databricks**, the code using Fabricks is highly portable. You'll predominantly write SQL-Select code, eliminating the need to manually write DDL/DML/Merge queries. In the future, we may add support for other platforms such as DuckDB or Open Source Spark. ๐๐
## Use Cases ๐ ๏ธ
- Data Ingestion using Python Notebooks, Jupyter-style
- ETL using SQL-queries (should cover 99% of cases) or Notebooks
- Data Distribution using Python Notebooks
No need for magic here. It's all your Data Lakehouse/Data Warehouse code in one place. Simple and great! โจ You don't need expensive Delta Live Tables, ETL Tools, or DBT. It's basically just writing SQL Queries and letting Fabricks do the magic ๐งโโ๏ธ.
## About this repo ๐ต๏ธโโ๏ธ
We're just getting started with open-sourcing Fabricks! There are many areas where we want to improve:
- Implement testing in GitHub Actions ๐งช๐จโ๐ป
- Decouple Spark dependencies where possible โก๐
- Migrate YAML parsing to Pydantic ๐๐
- Enhance documentation with more examples and best practices ๐๐ก
- Develop a comprehensive getting started guide ๐๐
- Create a contribution guide for the open-source community ๐ค๐
## More Information โน๏ธ
See [Fabricks Documentation](https://fabricks-framework.github.io/fabricks/)
### Release Notes
For the latest releases and detailed changelogs, please visit the [Fabricks Releases page on GitHub](https://github.com/fabricks-framework/fabricks/releases).
### Runtime Requirements
[โ] `Fabricks 4.0.0` was successfully tested on Databricks Runtime `16.4 LTS`.
[โ] `Fabricks 4.0.10` was successfully tested on Databricks Runtime `17.3 LTS`.
## Related Projects ๐
- We use [odbc2deltalake](https://github.com/bmsuisse/odbc2deltalake) for extensive SQL Server data ingestion in a pre_run notebook. ๐๐โโ๏ธ