An open API service indexing awesome lists of open source software.

https://github.com/madhurimarawat/data-warehousing

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.
https://github.com/madhurimarawat/data-warehousing

data-aggregation data-cleaning data-cleaning-and-preprocessing data-warehousing detailed-documentation etl etl-pipeline mysql normalization olap-cube olap-data olap-database query-optimization snowflake-schema star-schema

Last synced: 7 months ago
JSON representation

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

Awesome Lists containing this project

README

          

# Data-Warehousing
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.




Repo Size



GitHub Stars



GitHub Forks



GitHub Issues


Closed Issues



Open Pull Requests


Closed Pull Requests



GitHub Discussions



GitHub Contributors



License



Last Commit



Repository Age



GitHub Watchers


Visitors

---

## Tools and Technologies βš™οΈπŸ’»

1. [MySQL](https://dev.mysql.com/doc/): Β  An open-source relational database management system for managing and organizing structured data using SQL.
2. [Python](https://www.python.org/doc/): Β  A high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms and is widely used for web development, data analysis, automation, and scientific computing.
3. [Pandas](https://pandas.pydata.org/docs/): Β  An open-source data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, enabling efficient handling and analysis of structured data.
4. [NumPy](https://numpy.org/doc/): Β  A fundamental package for numerical computing in Python. It offers support for multi-dimensional arrays and matrices, along with a collection of mathematical functions for performing efficient operations on these data structures.
5. [MySQL Connector](https://dev.mysql.com/doc/connector-python/en/): Β  A Python library that enables connecting to a MySQL database server. It allows developers to execute SQL queries, manage database connections, and interact with MySQL databases directly from Python applications.

---

## Directory Structure πŸ“‚

```
Data-Warehousing/
β”‚
β”œβ”€β”€ Experiment 1/
β”‚ β”œβ”€β”€ Documentation/ πŸ“
| β”‚ β”œβ”€β”€ Explanation of methods and key observations from Experiment 1.
β”‚
β”œβ”€β”€ Experiment 2/
β”‚ β”œβ”€β”€ Codes/ πŸ’»
β”‚ β”‚ └── Contains the MySQL script for input and output in Experiment 2.
β”‚ β”œβ”€β”€ Documentation/ πŸ“
β”‚ β”‚ β”œβ”€β”€ Detailed documentation explaining the methodology and analysis for Experiment 2.
β”‚ β”œβ”€β”€ Output/ πŸ“Š
β”‚ β”‚ └── Contains the results and analysis of Experiment 2.
β”œβ”€β”€ Experiment 3/
β”‚ β”œβ”€β”€ Codes/ πŸ’»
β”‚ β”‚ └── Contains the MySQL script for input and output in Experiment 3.
β”‚ β”œβ”€β”€ Documentation/ πŸ“
β”‚ β”‚ β”œβ”€β”€ Detailed documentation explaining the methodology and analysis for Experiment 3.
β”‚ β”œβ”€β”€ Output/ πŸ“Š
β”‚ β”‚ └── Contains the results and analysis of Experiment 3.
.....
```

### **Project Folder Structure**

- **Codes** πŸ’» (If applicable)
Contains the source code files used for data processing and analysis in each experiment. These scripts are essential for executing tasks within the experiment. Additionally, the following files are included:
- **MySQL Commands and Output (TXT)**: This text file contains the specific MySQL command-line operations used in the experiment, documenting both the input commands and their corresponding outputs. A detailed explanation of these commands and their results can be found in the **Documentation** folder, available in both **MD** and **PDF** formats.

- **Dataset** πŸ“ (If applicable)
Stores datasets used in experiments, ensuring easy access and organization.
- e.g., `data.csv`, `stream_data.json`

- **Output** πŸ“Š
Stores results generated from experiments, including visualizations, processed data, logs, and analysis reports. Each experiment's output is stored separately with a relevant name.
- e.g., `Experiment_X_Output` (where "X" refers to the relevant experiment number)

- **Documentation** πŸ“
Contains detailed documentation for each experiment, covering methodology, analysis, and insights. Documentation is provided in both Markdown (`.md`) and PDF formats for easy reference.
- `documentation.md` (Markdown version)
- `documentation.pdf` (PDF version, converted from Markdown)

- **Commands File (πŸ“‹)**
A text file stored in the **Codes** folder, documenting specific commands, steps, and MySQL output used in the experiment. This is especially useful for tracking command-line operations and database interactions.
- `MySQL_Commands_Output.txt`

---

## Table Of Contents πŸ“” πŸ”– πŸ“‘

### 1. [Introduction to Data Warehousing Concepts](Experiment%201)

This experiment introduces the fundamental concepts and architecture of data warehousing, including ETL processes, data modeling techniques, and OLAP functionalities.

### 2. [Creating Star Schema in Data Warehouse](Experiment%202)

This experiment focuses on designing and implementing a star schema data model for a specified business scenario, emphasizing the creation of fact and dimension tables.

### 3. [Implementing Snowflake Schema in Data Warehouse](Experiment%203)

In this experiment, the Snowflake Schema was implemented to achieve a more
normalized data structure than the Star Schema.

### 4. [Designing ETL Process for Data Warehousing](Experiment%204)

In this experiment, an ETL process was designed and implemented to migrate
data from operational databases to a data warehouse.

### 5. [OLAP Operations in Data Warehousing](Experiment%205)

In this experiment, OLAP operations such as **slicing, dicing, drill-down, drill-up, and pivoting** were applied to analyze predefined data in a data warehouse.

### 6. [Data Cleansing and Transformation](Experiment%206)

This experiment involved **cleaning and transforming raw data** before loading it into the data warehouse, ensuring **consistency, accuracy, and completeness**.

### 7. [Query Optimization in Data Warehousing](Experiment%207)

SQL queries were **optimized for large-scale data warehouse applications** using techniques like **indexing, partitioning, and query tuning** to improve performance.

### 8. [Data Aggregation for Reporting](Experiment%208)

This experiment implemented **data aggregation techniques** to generate **summarized views of large datasets**, enhancing **reporting and analytical efficiency**.

### 9. [Designing and Implementing a Data Warehouse Report](Experiment%209)
This experiment involves generating business reports from a **MySQL data warehouse** using **SQL queries** and **Python** for data extraction and processing.

### 10. [Real-time Data Warehousing using Streaming Data](Experiment%2010)
A **real-time data pipeline** is implemented with **Python**, continuously ingesting streaming data into a **MySQL data warehouse** for immediate analysis.

### 11. [Implementing Slowly Changing Dimensions (SCD) in Data Warehousing](Experiment%2011)
This experiment applies **Slowly Changing Dimensions (SCD)** techniques in a **MySQL data warehouse**, developed using **Python** to maintain historical data accuracy.

---

## Thanks for Visiting πŸ˜„

- Drop a 🌟 if you find this repository useful.


- If you have any doubts or suggestions, feel free to reach me.


πŸ“« How to reach me: Β  [![Linkedin Badge](https://img.shields.io/badge/-madhurima-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/madhurima-rawat/) Β  Β 
Mail IllustrationπŸ“«


- **Contribute and Discuss:** Feel free to open issues πŸ›, submit pull requests πŸ› οΈ, or start discussions πŸ’¬ to help improve this repository!