Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ajschofield/etl-project

A solo continuation of the ETL pipeline project during the Data Engineering course at Northcoders
https://github.com/ajschofield/etl-project

Last synced: about 1 month ago
JSON representation

A solo continuation of the ETL pipeline project during the Data Engineering course at Northcoders

Awesome Lists containing this project

README

        

# ETL-Project
[![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)](https://www.python.org/)
![Azure](https://img.shields.io/badge/azure-%230072C6.svg?style=for-the-badge&logo=microsoftazure&logoColor=white)
[![Terraform](https://img.shields.io/badge/Terraform-7B42BC?style=for-the-badge&logo=terraform&logoColor=white)](https://www.terraform.io/)
[![Postgresql](https://img.shields.io/badge/PostgreSQL-316192?style=for-the-badge&logo=postgresql&logoColor=white)](https://www.postgresql.org/)
[![GitHub Actions](https://img.shields.io/badge/GitHub_Actions-2088FF?style=for-the-badge&logo=github-actions&logoColor=white)](https://github.com/features/actions)

For the original ToteSys project, please see [here](https://github.com/ajschofield/de-project-bentley).

This project implements a robust, serverless data processing platform that extracts data from an operational database, archives it in a data lake, and transforms it to be loaded into an easily accessible OLAP data warehouse. It is designed to be reliable, scalable and fully automated.

This platform includes the following key functions:
- Extracts data from a PostgreSQL database at regular intervals
- Stores raw data in a data lake for archival purposes
- Transforms the data to conform to a star schema optimised for analytical queries
- Loads the transformed data into a cloud-based data warehouse
- Ensures data consistency, with a maximum delay of 30 minutes from source to warehouse

The original solution used Amazon Web Services, but this solo iteration will be using Azure, requiring a rewrite of the Terraform configuration and Python code.

The deadline for completion is the **end of September**.

# Original Contributors
Below are the contributors to the original Totesys project.




ellsymonds


Ellie Symonds




lian-manonog


Lianmei Manon-og




T-Aji


Tolu Ajibade






HastarTara


Joslin Rashleigh




bulve-ad


Anzelika Belotelova




ajschofield


Alex Schofield