Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ritesh-ojha/data-engineering
End to End Data Engineering Projects
https://github.com/ritesh-ojha/data-engineering
airflow apache-kafka apache-spark aws-ec2 aws-glue aws-s3 data-engineering docker python
Last synced: about 2 months ago
JSON representation
End to End Data Engineering Projects
- Host: GitHub
- URL: https://github.com/ritesh-ojha/data-engineering
- Owner: ritesh-ojha
- Created: 2024-03-18T13:32:43.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-05-13T17:46:48.000Z (8 months ago)
- Last Synced: 2024-10-19T03:22:09.621Z (3 months ago)
- Topics: airflow, apache-kafka, apache-spark, aws-ec2, aws-glue, aws-s3, data-engineering, docker, python
- Language: Python
- Homepage:
- Size: 32.5 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# Data Engineering End to End Projects
This repository contains all my data engineering projects. In this repo, I will be exploring various data engineering tools and techniques.
![](thumbnail.jpg)
## Projects
1. [OpenWeather](/OpenWeather/)
- A data pipeline implemented using Apache Airflow on Amazon Web Services (AWS) for processing OpenWeather data.
- The pipeline involves extracting weather data from the OpenWeather API, transforming it, and loading it into a data warehouse for analysis and visualization.2. [Podcast](/Podcast/)
- I created a data pipeline using Airflow on docker. The pipeline will download podcast episodes.
- I stored our results in a Postgres database that we can easily query.3. [Spotify](/Spotify/)
- A data pipeline implemented on Amazon Web Services (AWS) for processing Spotify data.
- The pipeline involves loading CSV files containing information about artists, tracks, and albums into an S3 bucket.
- performing ETL (Extract, Transform, Load) using AWS Glue, storing the processed data as Parquet files, and finally querying and visualizing the data using Amazon Athena and Power BI.4. [Smart City](/Smart-City/)
- A data engineering project for simulating data generation using Python for Apache Kafka, processing the data with Apache Spark, and storing it in Amazon S3.
- All services will be orchestrated and run on Docker containers.## Contact
If you have any queries, feel free to reach out to me at [email protected] or create issue [here](https://github.com/ritesh-ojha/Data-Engineering/issues/new).