{"id":22689820,"url":"https://github.com/machinelearningzuu/data-engineering-projects","last_synced_at":"2026-04-30T00:04:24.734Z","repository":{"id":239143501,"uuid":"798626480","full_name":"machinelearningzuu/Data-Engineering-Projects","owner":"machinelearningzuu","description":"This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.","archived":false,"fork":false,"pushed_at":"2024-05-12T09:36:57.000Z","size":9704,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-04T17:17:23.227Z","etag":null,"topics":["airflow","bigquery","data-engineering","data-science","data-visualization","data-warehouse"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/machinelearningzuu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-10T06:42:46.000Z","updated_at":"2024-05-12T09:37:00.000Z","dependencies_parsed_at":"2024-05-10T10:01:00.721Z","dependency_job_id":null,"html_url":"https://github.com/machinelearningzuu/Data-Engineering-Projects","commit_stats":null,"previous_names":["machinelearningzuu/data-engineering-projects"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningzuu%2FData-Engineering-Projects","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningzuu%2FData-Engineering-Projects/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningzuu%2FData-Engineering-Projects/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningzuu%2FData-Engineering-Projects/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/machinelearningzuu","download_url":"https://codeload.github.com/machinelearningzuu/Data-Engineering-Projects/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246208132,"owners_count":20740834,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","bigquery","data-engineering","data-science","data-visualization","data-warehouse"],"created_at":"2024-12-10T00:22:46.096Z","updated_at":"2026-04-30T00:04:19.703Z","avatar_url":"https://github.com/machinelearningzuu.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data-Engineering-Projects\n\nWelcome to **Data-Engineering-Projects**, a comprehensive repository dedicated to housing innovative and scalable data engineering solutions.\n\n## Overview\n\nThis repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.\n\n## Projects\n\nEach project within this repository is self-contained with its own set of instructions, documentation, and necessary scripts or code.\n\n- **Project 1**: Retail Data Pipeline - AirFlow\n- **Project 2**: [Uber Data Pipeline - Mage](https://github.com/machinelearningzuu/Data-Engineering-Projects/tree/main/02-uber-data-pipeline)\n\n## Technologies\n\nThe projects in this repository leverage a variety of technologies, including:\n\n- Apache Spark\n- Apache Airflow\n- Amazon Redshift\n- Google BigQuery\n- SnowFlake\n- Docker\n- Mage\n\n## Highlights\n\n### Fact Table vs Dimension Table\n   ![image](https://github.com/machinelearningzuu/Data-Engineering-Projects/assets/41842488/1d36b206-8edf-4144-b2b1-80af3ede7343)\n\n### Data Pipeline Tree\n   ![image](https://github.com/machinelearningzuu/Data-Engineering-Projects/blob/main/02-uber-data-pipeline/docs/pipeline-tree.png)\n## Installation\n\nInstructions on how to install and configure the necessary environment or dependencies for the projects.\n\n```bash\n# Example installation code\npip install -r requirements.txt\n```\n\n## Usage\nExamples of how to use the projects or tools within this repository.\n\n```bash\n# Example usage code\npython project_1/main.py\n```\n\n## Contributing\nWe welcome contributions from the data engineering community. Please read our CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.\n\n## License\nThis project is licensed under the MIT License - see the LICENSE.md file for details.\n\n## Contact\nFor any inquiries or contributions, please open an issue or contact the repository maintainers.\n\nThank you for visiting Data-Engineering-Projects. We hope this repository empowers you to build robust and efficient data solutions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinelearningzuu%2Fdata-engineering-projects","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmachinelearningzuu%2Fdata-engineering-projects","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinelearningzuu%2Fdata-engineering-projects/lists"}