Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/machinelearningzuu/data-engineering-projects
This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.
https://github.com/machinelearningzuu/data-engineering-projects
airflow bigquery data-engineering data-science data-visualization data-warehouse
Last synced: about 2 months ago
JSON representation
This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.
- Host: GitHub
- URL: https://github.com/machinelearningzuu/data-engineering-projects
- Owner: machinelearningzuu
- Created: 2024-05-10T06:42:46.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-05-12T09:36:57.000Z (9 months ago)
- Last Synced: 2024-05-12T10:26:20.230Z (9 months ago)
- Topics: airflow, bigquery, data-engineering, data-science, data-visualization, data-warehouse
- Language: Jupyter Notebook
- Homepage:
- Size: 9.25 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Engineering-Projects
Welcome to **Data-Engineering-Projects**, a comprehensive repository dedicated to housing innovative and scalable data engineering solutions.
## Overview
This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.
## Projects
Each project within this repository is self-contained with its own set of instructions, documentation, and necessary scripts or code.
- **Project 1**: Retail Data Pipeline - AirFlow
- **Project 2**: [Uber Data Pipeline - Mage](https://github.com/machinelearningzuu/Data-Engineering-Projects/tree/main/02-uber-data-pipeline)## Technologies
The projects in this repository leverage a variety of technologies, including:
- Apache Spark
- Apache Airflow
- Amazon Redshift
- Google BigQuery
- SnowFlake
- Docker
- Mage## Highlights
### Fact Table vs Dimension Table
![image](https://github.com/machinelearningzuu/Data-Engineering-Projects/assets/41842488/1d36b206-8edf-4144-b2b1-80af3ede7343)### Data Pipeline Tree
![image](https://github.com/machinelearningzuu/Data-Engineering-Projects/blob/main/02-uber-data-pipeline/docs/pipeline-tree.png)
## InstallationInstructions on how to install and configure the necessary environment or dependencies for the projects.
```bash
# Example installation code
pip install -r requirements.txt
```## Usage
Examples of how to use the projects or tools within this repository.```bash
# Example usage code
python project_1/main.py
```## Contributing
We welcome contributions from the data engineering community. Please read our CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.## License
This project is licensed under the MIT License - see the LICENSE.md file for details.## Contact
For any inquiries or contributions, please open an issue or contact the repository maintainers.Thank you for visiting Data-Engineering-Projects. We hope this repository empowers you to build robust and efficient data solutions.