Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ajschofield/etl-project
A solo continuation of the ETL pipeline project during the Data Engineering course at Northcoders
https://github.com/ajschofield/etl-project
Last synced: about 1 month ago
JSON representation
A solo continuation of the ETL pipeline project during the Data Engineering course at Northcoders
- Host: GitHub
- URL: https://github.com/ajschofield/etl-project
- Owner: ajschofield
- Created: 2024-09-03T14:46:34.000Z (5 months ago)
- Default Branch: stable
- Last Pushed: 2024-09-07T14:09:05.000Z (5 months ago)
- Last Synced: 2024-11-11T22:37:15.498Z (3 months ago)
- Language: Python
- Homepage: https://plane.ajschof.me/spaces/issues/7e21ffb65fb94dd989a107d045c3974b
- Size: 41 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ETL-Project
[![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)](https://www.python.org/)
![Azure](https://img.shields.io/badge/azure-%230072C6.svg?style=for-the-badge&logo=microsoftazure&logoColor=white)
[![Terraform](https://img.shields.io/badge/Terraform-7B42BC?style=for-the-badge&logo=terraform&logoColor=white)](https://www.terraform.io/)
[![Postgresql](https://img.shields.io/badge/PostgreSQL-316192?style=for-the-badge&logo=postgresql&logoColor=white)](https://www.postgresql.org/)
[![GitHub Actions](https://img.shields.io/badge/GitHub_Actions-2088FF?style=for-the-badge&logo=github-actions&logoColor=white)](https://github.com/features/actions)For the original ToteSys project, please see [here](https://github.com/ajschofield/de-project-bentley).
This project implements a robust, serverless data processing platform that extracts data from an operational database, archives it in a data lake, and transforms it to be loaded into an easily accessible OLAP data warehouse. It is designed to be reliable, scalable and fully automated.
This platform includes the following key functions:
- Extracts data from a PostgreSQL database at regular intervals
- Stores raw data in a data lake for archival purposes
- Transforms the data to conform to a star schema optimised for analytical queries
- Loads the transformed data into a cloud-based data warehouse
- Ensures data consistency, with a maximum delay of 30 minutes from source to warehouseThe original solution used Amazon Web Services, but this solo iteration will be using Azure, requiring a rewrite of the Terraform configuration and Python code.
The deadline for completion is the **end of September**.
# Original Contributors
Below are the contributors to the original Totesys project.
![]()
Ellie Symonds
![]()
Lianmei Manon-og
![]()
Tolu Ajibade
![]()
Joslin Rashleigh
![]()
Anzelika Belotelova
![]()
Alex Schofield