{"id":29916811,"url":"https://github.com/thomasshikalepo/sql-data-warehouse-project","last_synced_at":"2025-08-02T05:03:56.808Z","repository":{"id":300551999,"uuid":"1006449687","full_name":"ThomasShikalepo/sql-data-warehouse-project","owner":"ThomasShikalepo","description":"Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics","archived":false,"fork":false,"pushed_at":"2025-06-29T13:07:36.000Z","size":2140,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-11T08:59:03.527Z","etag":null,"topics":["data-analysis","data-cleaning","data-engineering","data-lakehouse","data-science","data-warehouse","data-warehousing","datascience","datawarehousing","etl-pipeline","medallion-architecture","sql","sql-query","sql-server"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ThomasShikalepo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-22T09:49:09.000Z","updated_at":"2025-07-03T14:24:59.000Z","dependencies_parsed_at":"2025-06-22T11:18:30.705Z","dependency_job_id":"41de8872-1bad-4649-9232-f3c75ad4f822","html_url":"https://github.com/ThomasShikalepo/sql-data-warehouse-project","commit_stats":null,"previous_names":["thomasdeon/sql-data-warehouse-project","thomasshikalepo/sql-data-warehouse-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ThomasShikalepo/sql-data-warehouse-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ThomasShikalepo%2Fsql-data-warehouse-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ThomasShikalepo%2Fsql-data-warehouse-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ThomasShikalepo%2Fsql-data-warehouse-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ThomasShikalepo%2Fsql-data-warehouse-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ThomasShikalepo","download_url":"https://codeload.github.com/ThomasShikalepo/sql-data-warehouse-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ThomasShikalepo%2Fsql-data-warehouse-project/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268337977,"owners_count":24234538,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-cleaning","data-engineering","data-lakehouse","data-science","data-warehouse","data-warehousing","datascience","datawarehousing","etl-pipeline","medallion-architecture","sql","sql-query","sql-server"],"created_at":"2025-08-02T05:01:36.374Z","updated_at":"2025-08-02T05:03:56.799Z","avatar_url":"https://github.com/ThomasShikalepo.png","language":"TSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📊 Data Warehouse and Analytics Project\n\nWelcome to the **Data Warehouse and Analytics Project** repository! 🚀  \nThis portfolio project showcases a complete end-to-end data warehousing and analytics solution—from raw data ingestion to business intelligence reporting. It follows **industry best practices** in data engineering and analytics.\n\n---\n\n## 🏗️ Data Architecture\n\nThis project follows the **Medallion Architecture**, structured into three layers:\n\n![data_architecture](https://github.com/user-attachments/assets/1f295203-e8ab-4b6c-9de9-9df0514cceab)\n\n- **Bronze Layer**: Stores raw data ingested *as-is* from source systems (CSV files) into a SQL Server database.  \n- **Silver Layer**: Processes and transforms data with cleansing, standardization, and normalization techniques.  \n- **Gold Layer**: Contains **business-ready**, analytics-optimized data modeled using a **star schema**.\n\n---\n\n## 📖 Project Overview\n\nThis project involves:\n\n- **Data Architecture**: Building a modern warehouse with Medallion Architecture (Bronze, Silver, Gold).\n- **ETL Pipelines**: Extracting, transforming, and loading data from ERP and CRM CSVs.\n- **Data Modeling**: Designing fact and dimension tables for optimized analytical queries.\n- **Analytics \u0026 Reporting**: Creating SQL-based reports and dashboards for actionable business insights.\n\n---\n\n## 🧰 Tools \u0026 Resources\n\nEverything is **100% free** and open-source!\n\n- 📂 **Datasets**: ERP and CRM CSV files  \n- 🧩 **SQL Server Express**: Lightweight SQL Server instance  \n- 🖥️ **SQL Server Management Studio (SSMS)**: GUI for SQL Server  \n- 🧠 **Draw.io**: For data modeling and architecture diagrams  \n- 💡 **Notion**: Project templates and documentation  \n- 💻 **GitHub**: For version control and collaboration\n\n---\n\n## 🚀 Project Requirements\n\n### 🧱 Part 1: Building the Data Warehouse (Engineering)\n\n**Goal**: Develop a modern data warehouse using SQL Server for unified, analytics-ready sales data.\n\n**Specifications**:\n- Import data from two sources (ERP and CRM, in CSV format).\n- Cleanse and resolve data quality issues.\n- Integrate data into a **single analytical model**.\n- Focus on the **most recent data** (no historization required).\n- Document the data model for stakeholders and analysts.\n\n---\n\n### 📊 Part 2: Business Intelligence \u0026 Reporting (Analysis)\n\n**Goal**: Use SQL to analyze data and generate business insights.\n\n**Insights Provided**:\n- Customer Behavior\n- Product Performance\n- Sales Trends\n\nThese insights help drive **data-driven decision-making**.\n\n📄 For full details, see [`docs/requirements.md`](docs/requirements.md)\n\n---\n\n## 📂 Repository Structure\n\n```bash\ndata-warehouse-project/\n│\n├── datasets/                      # Raw datasets used for the project (ERP and CRM data)\n│\n├── docs/                          # Project documentation and architecture details\n│   ├── etl.drawio                 # Draw.io file showing ETL techniques and flow\n│   ├── data_architecture.drawio   # Diagram of the overall data warehouse architecture\n│   ├── data_catalog.md            # Metadata and field descriptions of datasets\n│   ├── data_flow.drawio           # Visual data flow from source to destination\n│   ├── data_models.drawio         # Star schema and data model designs\n│   ├── naming-conventions.md      # Standards for naming tables, fields, and files\n│\n├── scripts/                       # SQL scripts for ETL and transformation\n│   ├── bronze/                    # Scripts for loading raw data (Bronze layer)\n│   ├── silver/                    # Scripts for data cleansing and transformation (Silver layer)\n│   ├── gold/                      # Scripts for building the analytical model (Gold layer)\n│\n├── tests/                         # Data quality checks and testing scripts\n│\n├── README.md                      # Project overview and setup instructions\n├── LICENSE                        # License file for this repository\n├── .gitignore                     # Git ignore rules for files and folders\n└── requirements.txt               # Required software/tools and setup dependencies\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasshikalepo%2Fsql-data-warehouse-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomasshikalepo%2Fsql-data-warehouse-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasshikalepo%2Fsql-data-warehouse-project/lists"}