https://github.com/ritesh-ojha/data-engineering

End to End Data Engineering Projects
https://github.com/ritesh-ojha/data-engineering

airflow apache-kafka apache-spark aws-ec2 aws-glue aws-s3 data-engineering docker python

Last synced: 4 months ago
JSON representation

End to End Data Engineering Projects

Host: GitHub
URL: https://github.com/ritesh-ojha/data-engineering
Owner: ritesh-ojha
Created: 2024-03-18T13:32:43.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-13T17:46:48.000Z (about 1 year ago)
Last Synced: 2025-01-17T04:45:56.728Z (6 months ago)
Topics: airflow, apache-kafka, apache-spark, aws-ec2, aws-glue, aws-s3, data-engineering, docker, python
Language: Python
Homepage:
Size: 32.5 MB
Stars: 1
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

# Data Engineering End to End Projects

This repository contains all my data engineering projects. In this repo, I will be exploring various data engineering tools and techniques.

![](thumbnail.jpg)

## Projects

1. [OpenWeather](/OpenWeather/)
- A data pipeline implemented using Apache Airflow on Amazon Web Services (AWS) for processing OpenWeather data.
- The pipeline involves extracting weather data from the OpenWeather API, transforming it, and loading it into a data warehouse for analysis and visualization.

2. [Podcast](/Podcast/)
- I created a data pipeline using Airflow on docker. The pipeline will download podcast episodes.
- I stored our results in a Postgres database that we can easily query.

3. [Spotify](/Spotify/)
- A data pipeline implemented on Amazon Web Services (AWS) for processing Spotify data.
- The pipeline involves loading CSV files containing information about artists, tracks, and albums into an S3 bucket.
- performing ETL (Extract, Transform, Load) using AWS Glue, storing the processed data as Parquet files, and finally querying and visualizing the data using Amazon Athena and Power BI.

4. [Smart City](/Smart-City/)
- A data engineering project for simulating data generation using Python for Apache Kafka, processing the data with Apache Spark, and storing it in Amazon S3.
- All services will be orchestrated and run on Docker containers.

## Contact

If you have any queries, feel free to reach out to me at [email protected] or create issue [here](https://github.com/ritesh-ojha/Data-Engineering/issues/new).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ritesh-ojha/data-engineering

Awesome Lists containing this project

README