{"id":29040391,"url":"https://github.com/gloryodeyemi/sql-data-warehouse","last_synced_at":"2025-06-26T14:06:02.738Z","repository":{"id":292435887,"uuid":"980906335","full_name":"gloryodeyemi/SQL-Data-Warehouse","owner":"gloryodeyemi","description":"A comprehensive SQL Data Warehouse built from scratch using Azure Data Studio and SQL Server Express. It simulates an enterprise data pipeline using the Medallion Architecture and reflects industry best practices in Data Engineering, ETL design, and SQL-based data modeling.","archived":false,"fork":false,"pushed_at":"2025-06-16T19:25:16.000Z","size":13015,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-26T14:05:18.177Z","etag":null,"topics":["data-transformation","data-warehousing","etl-pipeline","medallion-architecture","sql-server","tsql"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gloryodeyemi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-09T23:40:06.000Z","updated_at":"2025-06-16T19:25:19.000Z","dependencies_parsed_at":"2025-05-10T00:28:42.630Z","dependency_job_id":"a87a3b0d-6fb6-47bb-9548-66afaf8f315e","html_url":"https://github.com/gloryodeyemi/SQL-Data-Warehouse","commit_stats":null,"previous_names":["gloryodeyemi/sql-data-warehouse"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gloryodeyemi/SQL-Data-Warehouse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gloryodeyemi%2FSQL-Data-Warehouse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gloryodeyemi%2FSQL-Data-Warehouse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gloryodeyemi%2FSQL-Data-Warehouse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gloryodeyemi%2FSQL-Data-Warehouse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gloryodeyemi","download_url":"https://codeload.github.com/gloryodeyemi/SQL-Data-Warehouse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gloryodeyemi%2FSQL-Data-Warehouse/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262081117,"owners_count":23255662,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-transformation","data-warehousing","etl-pipeline","medallion-architecture","sql-server","tsql"],"created_at":"2025-06-26T14:05:09.755Z","updated_at":"2025-06-26T14:06:02.699Z","avatar_url":"https://github.com/gloryodeyemi.png","language":"TSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🏢 SQL Data Warehouse\nThis project demonstrates a comprehensive **SQL Data Warehousing and Analytics solution**, built from scratch using **Azure Data Studio** and **SQL Server Express**. It simulates an enterprise data pipeline by ingesting raw data from source systems, performing structured transformations, and delivering business-ready data for reporting and analytics using **Medallion Architecture**. It reflects industry best practices in **Data Engineering**, **ETL design**, and **SQL-based data modeling**.\n\n---\n\n## 🛠️ Tools \u0026 Technologies\n\nThis project utilizes the following tools and technologies for building, managing, and analyzing the SQL data warehouse:\n\n| Tool / Technology        | Badge                                                                                     | Description |\n|--------------------------|--------------------------------------------------------------------------------------------|-------------|\n| **SQL Server Express**   | ![SQL Server](https://img.shields.io/badge/SQL%20Server-CC2927?style=for-the-badge\u0026logo=microsoftsqlserver\u0026logoColor=white) | Relational database engine used to store and manage the data warehouse. |\n| **Azure Data Studio**    | ![Azure Data Studio](https://img.shields.io/badge/Azure%20Data%20Studio-0078D4?style=for-the-badge\u0026logo=visualstudiocode\u0026logoColor=white) | SQL editor used for database development and management. |\n| **T-SQL (Transact-SQL)** | ![T-SQL](https://img.shields.io/badge/T--SQL-CC2927?style=for-the-badge\u0026logo=microsoft\u0026logoColor=white) | SQL dialect for defining transformations, querying, and manipulating data. |\n| **Git \u0026 GitHub**         | ![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge\u0026logo=github\u0026logoColor=white) | Version control and project repository for managing code and documentation. |\n| **Star Schema Design**   | ![Data Modeling](https://img.shields.io/badge/Data%20Modeling-4B8BBE?style=for-the-badge\u0026logo=datagrip\u0026logoColor=white) | Dimensional modeling technique used for analytical querying. |\n| **Medallion Architecture** | ![Medallion](https://img.shields.io/badge/Medallion%20Architecture-0E76A8?style=for-the-badge\u0026logo=data\u0026logoColor=white) | Bronze, Silver, and Gold layers for raw, cleaned, and business-ready data. |\n| **draw.io (diagrams.net)** | ![draw.io](https://img.shields.io/badge/draw.io-F08705?style=for-the-badge\u0026logo=diagramsdotnet\u0026logoColor=white) | Used to design architectural diagrams and data flow visuals. |\n| **Notion**               | ![Notion](https://img.shields.io/badge/Notion-000000?style=for-the-badge\u0026logo=notion\u0026logoColor=white) | Project planning and documentation hub for tracking milestones and tasks. |\n\n\u003e 🔍 *Optional tools like Power BI, Excel, or Tableau can be connected to the Gold Layer for business intelligence and reporting.*\n\n\n---\n## 🏗️ Data Architecture\n\nThe project follows a **Medallion Architecture** consisting of three key layers: **Bronze**, **Silver**, and **Gold** layers:\n\n![Data Architecture](docs/data_architecture.png)\n\n1. **✅ Bronze Layer**: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.\n2. **✅ Silver Layer**: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.\n3. **✅ Gold Layer**: Houses business-ready data modeled into a star schema required for reporting and analytics.\n\n---\n## 📖 Project Overview\n\nThis project involves:\n\n1. **Data Architecture**: Designing a Modern Data Warehouse Using Medallion Architecture **Bronze**, **Silver**, and **Gold** layers.\n2. **ETL Pipelines**: Extracting, transforming, and loading data from source systems into the warehouse.\n3. **Data Modeling**: Developing fact and dimension tables optimized for analytical queries.\n\u003c!-- 4. **Analytics \u0026 Reporting**: Creating SQL-based reports and dashboards for actionable insights. --\u003e\n\n---\n\n## 🚀 Project Requirements\n\n\u003c!-- ### Building the Data Warehouse (Data Engineering) --\u003e\n\n#### Objective\nDevelop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.\n\n#### Specifications\n- **✅ Data Sources**: Import data from two source systems (ERP and CRM) provided as CSV files.\n- **✅ Data Quality**: Cleanse and resolve data quality issues prior to analysis.\n- **✅ Integration**: Combine both sources into a single, user-friendly data model designed for analytical queries.\n- **✅ Scope**: Focus on the latest dataset only; historization of data is not required.\n- **✅ Documentation**: Provide clear documentation of the data model to support both business stakeholders and analytics teams.\n- **✅ Business Use Cases**:\n  - Customer segmentation and behavior analysis\n  - Product performance tracking\n  - Sales trends\n\n---\n\n\u003c!-- ### BI: Analytics \u0026 Reporting (Data Analysis)\n\n#### Objective\nDevelop SQL-based analytics to deliver detailed insights into:\n- **Customer Behavior**\n- **Product Performance**\n- **Sales Trends**\n\nThese insights empower stakeholders with key business metrics, enabling strategic decision-making.  \n\nFor more details, refer to [docs/requirements.md](docs/requirements.md). --\u003e\n\n## 📂 Repository Structure\n```\nSQL-Data-Warehouse/\n│\n├── datasets/                           # Raw datasets used for the project (ERP and CRM data)\n│\n├── docs/                               # Project documentation and architecture details\n│   ├── data_architecture.png           # Image file for the project's architecture\n│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata\n│   ├── data_flow.png                   # Image file for the data flow diagram\n│   ├── data_integration.png            # Image file for the data integration diagram\n│   ├── data_model.png                  # Image file for data model (star schema)\n│   ├── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files\n│\n├── scripts/                            # SQL scripts for ETL and transformations\n│   ├── bronze/                         # Scripts for extracting and loading raw data\n│   ├── silver/                         # Scripts for cleaning and transforming data\n│   ├── gold/                           # Scripts for creating analytical models\n│   ├── database_init.sql               # Script for creating database and schemas\n│\n├── tests/                              # Test scripts and quality files\n│\n├── README.md                           # Project overview and instructions\n├── LICENSE                             # License information for the repository\n└── .gitignore                          # Files and directories to be ignored by Git\n```\n---\n\n## ⚙️ How to Run This Project\n\n1. **Clone the Repository**\n\n   ```bash\n   git clone https://github.com/gloryodeyemi/SQL-Data-Warehouse.git\n   cd SQL-Data-Warehouse\n2. **Set Up SQL Server Environment**\n* Install SQL Server Express and Azure Data Studio (if not already installed).\n  \n3. **Run ETL Scripts**\n* Run `scripts/database_init.sql` to initialize the database and schemas.\n* Load ERP and CRM CSV files into Bronze layer tables using scripts in `scripts/bronze/`.\n* Transform and clean the data using scripts in `scripts/silver/`.\n* Execute the script in `scripts/gold/` to generate business-ready data for analytics and reporting.\n  \n![Data Flow](docs/data_flow.png)\n\n4. **Explore Data**\n* Use the star schema in the Gold layer for analytical queries and reporting.\n  \n![Data Model](docs/data_model.png)\n\n---\n\n## 🧪 Testing \u0026 Validation\n* Data quality checks scripts in the tests/ folder ensure:\n  * Data consistency, accuracy, and standardization by checking for:\n    - Null or duplicate primary keys.\n    - Unwanted spaces in string fields.\n    - Data standardization and consistency.\n    - Invalid date ranges and orders.\n    - Data consistency between related fields.\n  * Uniqueness of surrogate keys in dimension tables.\n  * Referential integrity between fact and dimension tables.\n  * Validation of relationships in the data model for analytical purposes.\n\n---\n \n## 🔮 Future Work\nThis project lays the foundation for a robust and scalable data warehouse. Future enhancements could include:\n\n* 📊 SQL-Based Analytics\n  \n  Develop advanced SQL queries to extract business insights such as:\n  * Customer segmentation\n  * Sales trends\n  * Product performance\n  * Revenue by country\n    \n* 📈 Integration with BI Tools\n  \n  Connect the Gold layer to Business Intelligence tools like:\n  * Power BI\n  * Tableau\n  * Metabase\n    \n  ...to create interactive dashboards and self-service analytics for stakeholders.\n\n* 🛠️ Automation \u0026 Scheduling\n\n  Use SQL Server Agent or external orchestration tools (e.g., Airflow, Azure Data Factory) to automate ETL pipelines and data refreshes.\n\n* 🔐 Role-Based Access Control (RBAC)\n\n  Enforce security policies and access levels depending on user roles (data analyst, data engineer, etc.)\n\n* 📦 Data Export APIs\n\n  Build export mechanisms for downstream systems and data consumers.\n\n---\n\n## 🛡️ License\n\nThis project is licensed under the [MIT License](LICENSE). You are free to use, modify, and share this project with proper attribution.\n\n## 🌟 About Me\n\nHi there! I'm **Glory Odeyemi**, a Data Engineer \u0026 Analyst!\n\nLet's stay in touch! Feel free to connect with me on the following platforms:\n\n[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge\u0026logo=linkedin\u0026logoColor=white)](https://www.linkedin.com/in/glory-odeyemi/)\n[![GitHub](https://img.shields.io/badge/GitHub-24292e?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/gloryodeyemi)\n[![Portfolio](https://img.shields.io/badge/Portfolio-ffffff?style=for-the-badge\u0026labelColor=FF0000\u0026logo=google-chrome\u0026logoColor=white)](https://gloryodeyemi.github.io/)\n[![Medium](https://img.shields.io/badge/Medium-12100E?style=for-the-badge\u0026logo=medium\u0026logoColor=white)](https://glowcodes.medium.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgloryodeyemi%2Fsql-data-warehouse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgloryodeyemi%2Fsql-data-warehouse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgloryodeyemi%2Fsql-data-warehouse/lists"}