{"id":28403420,"url":"https://github.com/nickklos10/sql-data-warehouse","last_synced_at":"2026-02-23T18:09:18.634Z","repository":{"id":295706502,"uuid":"990976744","full_name":"nickklos10/sql-data-warehouse","owner":"nickklos10","description":"A complete SQL-based data warehouse implementation featuring a medallion architecture (Bronze, Silver, Gold layers) for processing and analyzing customer and sales data from multiple source systems.","archived":false,"fork":false,"pushed_at":"2025-05-28T22:57:29.000Z","size":1382,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-08T16:17:41.303Z","etag":null,"topics":["business-intelligence","data-engineering","data-warehousing","etl","postgresql","sql"],"latest_commit_sha":null,"homepage":"","language":"PLpgSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nickklos10.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-27T00:09:41.000Z","updated_at":"2025-05-28T22:58:56.000Z","dependencies_parsed_at":"2025-05-27T01:25:42.036Z","dependency_job_id":"89dbf7d4-d6ad-4dd0-aa21-993dbb71d97c","html_url":"https://github.com/nickklos10/sql-data-warehouse","commit_stats":null,"previous_names":["nickklos10/sql-data-warehouse"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nickklos10/sql-data-warehouse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickklos10%2Fsql-data-warehouse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickklos10%2Fsql-data-warehouse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickklos10%2Fsql-data-warehouse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickklos10%2Fsql-data-warehouse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nickklos10","download_url":"https://codeload.github.com/nickklos10/sql-data-warehouse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickklos10%2Fsql-data-warehouse/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262222411,"owners_count":23277423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business-intelligence","data-engineering","data-warehousing","etl","postgresql","sql"],"created_at":"2025-06-01T17:36:48.710Z","updated_at":"2025-10-25T07:33:31.738Z","avatar_url":"https://github.com/nickklos10.png","language":"PLpgSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SQL Data Warehouse\n\nA complete SQL-based data warehouse implementation featuring a medallion architecture (Bronze, Silver, Gold layers) for processing and analyzing customer and sales data from multiple source systems.\n\n## 📋 Overview\n\nThis project demonstrates a modern data warehouse design pattern using a medallion architecture to transform raw business data into analytics-ready datasets. The warehouse integrates data from two primary source systems:\n\n- **CRM System**: Customer information, product catalog, and sales transactions\n- **ERP System**: Customer demographics, location data, and product categorization\n\n## 🏗️ Architecture\n\n### Medallion Architecture Layers\n\n![Data Architecture](data_architecture.png)\n\nThe data warehouse follows a medallion architecture pattern with three distinct layers, each serving a specific purpose in the data transformation pipeline:\n\n#### 🥉 Bronze Layer\n- **Purpose**: Raw data ingestion from source systems\n- **Data Quality**: As-is from source, minimal transformations\n- **Tables**:\n  - `crm_cust_info` - Customer master data\n  - `crm_prd_info` - Product catalog\n  - `crm_sales_details` - Sales transactions\n  - `erp_loc_a101` - Customer location data\n  - `erp_cust_az12` - Customer demographics\n  - `erp_px_cat_g1v2` - Product categories\n\n#### 🥈 Silver Layer\n- **Purpose**: Cleaned, validated, and enriched data\n- **Data Quality**: Business rules applied, data types standardized\n- **Features**:\n  - Data validation and quality checks\n  - Standardized column naming\n  - Data warehouse timestamps (`dwh_create_date`)\n  - Consistent data types and formats\n\n#### 🥇 Gold Layer\n- **Purpose**: Business-ready analytical views\n- **Data Quality**: Fully dimensional model, optimized for analytics\n- **Views**:\n  - `dim_customers` - Customer dimension with integrated CRM/ERP data\n  - `dim_products` - Product dimension with category hierarchies\n  - `fact_sales` - Sales fact table with dimensional relationships\n\n## 📁 Project Structure\n\n```\nsql-data-warehouse/\n├── datasets/                  # Source data files\n│   ├── source_crm/           # CRM system exports\n│   │   ├── cust_info.csv     # Customer information\n│   │   ├── prd_info.csv      # Product catalog\n│   │   └── sales_details.csv # Sales transactions\n│   └── source_erp/           # ERP system exports\n│       ├── CUST_AZ12.csv     # Customer demographics\n│       ├── LOC_A101.csv      # Location data\n│       └── PX_CAT_G1V2.csv   # Product categories\n├── scripts/                  # SQL deployment scripts\n│   ├── bronze/               # Bronze layer DDL and ETL\n│   │   ├── ddl_bronze.sql    # Table definitions\n│   │   └── proc_load_bronze.sql # Data loading procedures\n│   ├── silver/               # Silver layer DDL and ETL\n│   │   ├── ddl_silver.sql    # Table definitions\n│   │   └── proc_load_silver.sql # Data transformation procedures\n│   ├── gold/                 # Gold layer DDL\n│   │   └── ddl_gold.sql      # Dimensional views\n│   ├── create_datawarehouse_db.sql # Database creation\n│   └── create_schemas_in_datawarehouse.sql # Schema setup\n├── tests/                    # Data quality validation\n│   ├── quality_checks_silver.sql # Silver layer validations\n│   └── quality_checks_gold.sql   # Gold layer validations\n└── README.md                 # This file\n```\n\n## 🚀 Getting Started\n\n### Prerequisites\n\n- PostgreSQL 12+ (or compatible SQL database)\n- Database client (psql, pgAdmin, DBeaver, etc.)\n- CSV import capabilities\n\n### Database Setup\n\n1. **Create the database:**\n   ```sql\n   -- Run scripts/create_datawarehouse_db.sql\n   DROP DATABASE IF EXISTS datawarehouse;\n   CREATE DATABASE datawarehouse;\n   ```\n\n2. **Create schemas:**\n   ```sql\n   -- Run scripts/create_schemas_in_datawarehouse.sql\n   CREATE SCHEMA bronze;\n   CREATE SCHEMA silver;\n   CREATE SCHEMA gold;\n   ```\n\n3. **Deploy Bronze layer:**\n   ```sql\n   -- Run scripts/bronze/ddl_bronze.sql\n   \\i scripts/bronze/ddl_bronze.sql\n   ```\n\n4. **Deploy Silver layer:**\n   ```sql\n   -- Run scripts/silver/ddl_silver.sql\n   \\i scripts/silver/ddl_silver.sql\n   ```\n\n5. **Deploy Gold layer:**\n   ```sql\n   -- Run scripts/gold/ddl_gold.sql\n   \\i scripts/gold/ddl_gold.sql\n   ```\n\n### Data Loading\n\n1. **Load raw data into Bronze tables:**\n   ```sql\n   -- Use your preferred method to load CSV files:\n   -- Option 1: Using COPY command\n   \\COPY bronze.crm_cust_info FROM 'datasets/source_crm/cust_info.csv' WITH CSV HEADER;\n   \n   -- Option 2: Run the ETL procedure\n   \\i scripts/bronze/proc_load_bronze.sql\n   ```\n\n2. **Transform data to Silver layer:**\n   ```sql\n   \\i scripts/silver/proc_load_silver.sql\n   ```\n\n3. **Verify data quality:**\n   ```sql\n   \\i tests/quality_checks_silver.sql\n   \\i tests/quality_checks_gold.sql\n   ```\n\n## 🔍 Usage Examples\n\n### Customer Analysis\n```sql\n-- Top customers by sales volume\nSELECT \n    c.first_name || ' ' || c.last_name AS customer_name,\n    c.country,\n    SUM(f.sales_amount) AS total_sales,\n    COUNT(f.order_number) AS order_count\nFROM gold.fact_sales f\nJOIN gold.dim_customers c ON f.customer_key = c.customer_key\nGROUP BY c.customer_key, customer_name, c.country\nORDER BY total_sales DESC\nLIMIT 10;\n```\n\n### Product Performance\n```sql\n-- Product sales by category\nSELECT \n    p.category,\n    p.subcategory,\n    SUM(f.sales_amount) AS category_sales,\n    AVG(f.price) AS avg_price\nFROM gold.fact_sales f\nJOIN gold.dim_products p ON f.product_key = p.product_key\nGROUP BY p.category, p.subcategory\nORDER BY category_sales DESC;\n```\n\n### Monthly Sales Trends\n```sql\n-- Sales trend analysis\nSELECT \n    DATE_TRUNC('month', f.order_date) AS month,\n    SUM(f.sales_amount) AS monthly_sales,\n    COUNT(DISTINCT f.customer_key) AS unique_customers,\n    COUNT(f.order_number) AS total_orders\nFROM gold.fact_sales f\nGROUP BY month\nORDER BY month;\n```\n\n## 🧪 Data Quality\n\nThe project includes comprehensive data quality checks:\n\n- **Referential Integrity**: Primary key uniqueness and foreign key relationships\n- **Data Completeness**: NULL value detection in critical fields\n- **Data Consistency**: Cross-table validation and business rule enforcement\n- **Data Format**: Standardized formats and trimmed whitespace\n- **Logical Validation**: Date ranges, calculated field verification\n\nRun quality checks after each data load:\n```sql\n\\i tests/quality_checks_silver.sql\n\\i tests/quality_checks_gold.sql\n```\n\n## 📊 Business Intelligence\n\nThe Gold layer provides analytics-ready data for:\n\n- **Customer Segmentation**: Demographics, geography, purchase behavior\n- **Product Analysis**: Category performance, pricing strategies\n- **Sales Analytics**: Trends, seasonality, growth metrics\n- **Operational Insights**: Order fulfillment, shipping performance\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add your changes with appropriate tests\n4. Ensure data quality checks pass\n5. Submit a pull request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🏷️ Tags\n\n`#datawarehouse` `#sql` `#medallion-architecture` `#etl` `#analytics` `#postgresql` `#dataengineering` `#businessintelligence`\n\n---","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickklos10%2Fsql-data-warehouse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnickklos10%2Fsql-data-warehouse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickklos10%2Fsql-data-warehouse/lists"}