Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/machinelearningzuu/data-engineering-projects

This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.
https://github.com/machinelearningzuu/data-engineering-projects

airflow bigquery data-engineering data-science data-visualization data-warehouse

Last synced: about 2 months ago
JSON representation

This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.

Awesome Lists containing this project

README

        

# Data-Engineering-Projects

Welcome to **Data-Engineering-Projects**, a comprehensive repository dedicated to housing innovative and scalable data engineering solutions.

## Overview

This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.

## Projects

Each project within this repository is self-contained with its own set of instructions, documentation, and necessary scripts or code.

- **Project 1**: Retail Data Pipeline - AirFlow
- **Project 2**: [Uber Data Pipeline - Mage](https://github.com/machinelearningzuu/Data-Engineering-Projects/tree/main/02-uber-data-pipeline)

## Technologies

The projects in this repository leverage a variety of technologies, including:

- Apache Spark
- Apache Airflow
- Amazon Redshift
- Google BigQuery
- SnowFlake
- Docker
- Mage

## Highlights

### Fact Table vs Dimension Table
![image](https://github.com/machinelearningzuu/Data-Engineering-Projects/assets/41842488/1d36b206-8edf-4144-b2b1-80af3ede7343)

### Data Pipeline Tree
![image](https://github.com/machinelearningzuu/Data-Engineering-Projects/blob/main/02-uber-data-pipeline/docs/pipeline-tree.png)
## Installation

Instructions on how to install and configure the necessary environment or dependencies for the projects.

```bash
# Example installation code
pip install -r requirements.txt
```

## Usage
Examples of how to use the projects or tools within this repository.

```bash
# Example usage code
python project_1/main.py
```

## Contributing
We welcome contributions from the data engineering community. Please read our CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.

## License
This project is licensed under the MIT License - see the LICENSE.md file for details.

## Contact
For any inquiries or contributions, please open an issue or contact the repository maintainers.

Thank you for visiting Data-Engineering-Projects. We hope this repository empowers you to build robust and efficient data solutions.