https://github.com/henry-richard7/datacraft-framework
A framework that eliminates the dependency on Apache Spark by leveraging delta-rs for the creation and management of Delta Lake tables. This framework follows Medallion architecture.
https://github.com/henry-richard7/datacraft-framework
delta-lake ingestion ingestion-framework polars
Last synced: about 2 months ago
JSON representation
A framework that eliminates the dependency on Apache Spark by leveraging delta-rs for the creation and management of Delta Lake tables. This framework follows Medallion architecture.
- Host: GitHub
- URL: https://github.com/henry-richard7/datacraft-framework
- Owner: henry-richard7
- Created: 2025-05-26T17:16:54.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2025-06-19T04:49:35.000Z (4 months ago)
- Last Synced: 2025-06-19T05:25:12.449Z (4 months ago)
- Topics: delta-lake, ingestion, ingestion-framework, polars
- Language: HTML
- Homepage:
- Size: 271 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Datacraft Framework
A framework that eliminates the dependency on Apache Spark by leveraging delta-rs for the creation and management of Delta Lake tables. This framework follows Medallion architecture.
# Libraries
- **Polars** : For reading and writing delta lake tables.
- **SqlModel** : For storing and reading configurations from database tables.
- **Niquests** : For API requests.# Framework Architecture
## 🟠Bronze Layer:
Copies data from external sources to a common location.
The framework supports the below external sources out-of-box
- ✅ SFTP Extraction .
- ✅ API Extraction .
- ✅ JDBC/ODBC Extraction .
- ✅ Salesforce/Veeva Extraction .
- ✅ Cloud Object Storage (S3, ADLS, GCP, ...) Extraction .Copies data from external sources to a common location.
## ⚪ Silver Layer:
Performs Data Standardization and Data Quality Management Checks
**Data Standardization Functions:**
- ✅ Padding.
- ✅ trim.
- ✅ blank_conversion.
- ✅ replace (regex).
- ✅ type_conversion (to lower or upper).
- ✅ sub_string**Data Quality Management Checks**
- ✅ null checks.
- ✅ unique checks.
- ✅ decimal checks.
- ✅ integer checks.
- ✅ length checks.
- ✅ date checks.
- ✅ domain checks.
- ✅ custom checks.## 🟡 Gold Layer:
Here transformation logics are performed.
Here SCD-Type 2 is performed and Historical records are maintained in Compute and Only active records are maintained in Publish locations.