https://github.com/madhurimarawat/data-warehousing

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.
https://github.com/madhurimarawat/data-warehousing

data-aggregation data-cleaning data-cleaning-and-preprocessing data-warehousing detailed-documentation etl etl-pipeline mysql normalization olap-cube olap-data olap-database query-optimization snowflake-schema star-schema

Last synced: 7 months ago
JSON representation

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

Host: GitHub
URL: https://github.com/madhurimarawat/data-warehousing
Owner: madhurimarawat
License: mit
Created: 2025-01-23T09:07:09.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-03-10T07:39:35.000Z (8 months ago)
Last Synced: 2025-03-10T08:28:52.329Z (8 months ago)
Topics: data-aggregation, data-cleaning, data-cleaning-and-preprocessing, data-warehousing, detailed-documentation, etl, etl-pipeline, mysql, normalization, olap-cube, olap-data, olap-database, query-optimization, snowflake-schema, star-schema
Language: Jupyter Notebook
Homepage:
Size: 7.21 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Data-Warehousing
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

---

## Tools and Technologies ⚙️💻

1. [MySQL](https://dev.mysql.com/doc/): An open-source relational database management system for managing and organizing structured data using SQL.
2. [Python](https://www.python.org/doc/): A high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms and is widely used for web development, data analysis, automation, and scientific computing.
3. [Pandas](https://pandas.pydata.org/docs/): An open-source data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, enabling efficient handling and analysis of structured data.
4. [NumPy](https://numpy.org/doc/): A fundamental package for numerical computing in Python. It offers support for multi-dimensional arrays and matrices, along with a collection of mathematical functions for performing efficient operations on these data structures.
5. [MySQL Connector](https://dev.mysql.com/doc/connector-python/en/): A Python library that enables connecting to a MySQL database server. It allows developers to execute SQL queries, manage database connections, and interact with MySQL databases directly from Python applications.

---

## Directory Structure 📂

```
Data-Warehousing/
│
├── Experiment 1/
│ ├── Documentation/ 📝
| │ ├── Explanation of methods and key observations from Experiment 1.
│
├── Experiment 2/
│ ├── Codes/ 💻
│ │ └── Contains the MySQL script for input and output in Experiment 2.
│ ├── Documentation/ 📝
│ │ ├── Detailed documentation explaining the methodology and analysis for Experiment 2.
│ ├── Output/ 📊
│ │ └── Contains the results and analysis of Experiment 2.
├── Experiment 3/
│ ├── Codes/ 💻
│ │ └── Contains the MySQL script for input and output in Experiment 3.
│ ├── Documentation/ 📝
│ │ ├── Detailed documentation explaining the methodology and analysis for Experiment 3.
│ ├── Output/ 📊
│ │ └── Contains the results and analysis of Experiment 3.
.....
```

### **Project Folder Structure**

- **Codes** 💻 (If applicable)
Contains the source code files used for data processing and analysis in each experiment. These scripts are essential for executing tasks within the experiment. Additionally, the following files are included:
- **MySQL Commands and Output (TXT)**: This text file contains the specific MySQL command-line operations used in the experiment, documenting both the input commands and their corresponding outputs. A detailed explanation of these commands and their results can be found in the **Documentation** folder, available in both **MD** and **PDF** formats.

- **Dataset** 📁 (If applicable)
Stores datasets used in experiments, ensuring easy access and organization.
- e.g., `data.csv`, `stream_data.json`

- **Output** 📊
Stores results generated from experiments, including visualizations, processed data, logs, and analysis reports. Each experiment's output is stored separately with a relevant name.
- e.g., `Experiment_X_Output` (where "X" refers to the relevant experiment number)

- **Documentation** 📝
Contains detailed documentation for each experiment, covering methodology, analysis, and insights. Documentation is provided in both Markdown (`.md`) and PDF formats for easy reference.
- `documentation.md` (Markdown version)
- `documentation.pdf` (PDF version, converted from Markdown)

- **Commands File (📋)**
A text file stored in the **Codes** folder, documenting specific commands, steps, and MySQL output used in the experiment. This is especially useful for tracking command-line operations and database interactions.
- `MySQL_Commands_Output.txt`

---

## Table Of Contents 📔 🔖 📑

### 1. [Introduction to Data Warehousing Concepts](Experiment%201)

This experiment introduces the fundamental concepts and architecture of data warehousing, including ETL processes, data modeling techniques, and OLAP functionalities.

### 2. [Creating Star Schema in Data Warehouse](Experiment%202)

This experiment focuses on designing and implementing a star schema data model for a specified business scenario, emphasizing the creation of fact and dimension tables.

### 3. [Implementing Snowflake Schema in Data Warehouse](Experiment%203)

In this experiment, the Snowflake Schema was implemented to achieve a more
normalized data structure than the Star Schema.

### 4. [Designing ETL Process for Data Warehousing](Experiment%204)

In this experiment, an ETL process was designed and implemented to migrate
data from operational databases to a data warehouse.

### 5. [OLAP Operations in Data Warehousing](Experiment%205)

In this experiment, OLAP operations such as **slicing, dicing, drill-down, drill-up, and pivoting** were applied to analyze predefined data in a data warehouse.

### 6. [Data Cleansing and Transformation](Experiment%206)

This experiment involved **cleaning and transforming raw data** before loading it into the data warehouse, ensuring **consistency, accuracy, and completeness**.

### 7. [Query Optimization in Data Warehousing](Experiment%207)

SQL queries were **optimized for large-scale data warehouse applications** using techniques like **indexing, partitioning, and query tuning** to improve performance.

### 8. [Data Aggregation for Reporting](Experiment%208)

This experiment implemented **data aggregation techniques** to generate **summarized views of large datasets**, enhancing **reporting and analytical efficiency**.

### 9. [Designing and Implementing a Data Warehouse Report](Experiment%209)
This experiment involves generating business reports from a **MySQL data warehouse** using **SQL queries** and **Python** for data extraction and processing.

### 10. [Real-time Data Warehousing using Streaming Data](Experiment%2010)
A **real-time data pipeline** is implemented with **Python**, continuously ingesting streaming data into a **MySQL data warehouse** for immediate analysis.

### 11. [Implementing Slowly Changing Dimensions (SCD) in Data Warehousing](Experiment%2011)
This experiment applies **Slowly Changing Dimensions (SCD)** techniques in a **MySQL data warehouse**, developed using **Python** to maintain historical data accuracy.

---

## Thanks for Visiting 😄

- Drop a 🌟 if you find this repository useful.

- If you have any doubts or suggestions, feel free to reach me.

📫 How to reach me: [![Linkedin Badge](https://img.shields.io/badge/-madhurima-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/madhurima-rawat/)

- **Contribute and Discuss:** Feel free to open issues 🐛, submit pull requests 🛠️, or start discussions 💬 to help improve this repository!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/madhurimarawat/data-warehousing

Awesome Lists containing this project

README