{"id":15208891,"url":"https://github.com/ritesh-ojha/data-engineering","last_synced_at":"2026-02-10T17:04:40.698Z","repository":{"id":229037023,"uuid":"773832211","full_name":"ritesh-ojha/Data-Engineering","owner":"ritesh-ojha","description":"End to End Data Engineering Projects","archived":false,"fork":false,"pushed_at":"2024-05-13T17:46:48.000Z","size":34084,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-17T04:45:56.728Z","etag":null,"topics":["airflow","apache-kafka","apache-spark","aws-ec2","aws-glue","aws-s3","data-engineering","docker","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ritesh-ojha.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-18T13:32:43.000Z","updated_at":"2025-01-06T18:39:12.000Z","dependencies_parsed_at":"2024-09-29T06:15:22.932Z","dependency_job_id":null,"html_url":"https://github.com/ritesh-ojha/Data-Engineering","commit_stats":{"total_commits":20,"total_committers":1,"mean_commits":20.0,"dds":0.0,"last_synced_commit":"6102245d9650dfee49c28f8c87b1b907b6593437"},"previous_names":["ritesh-ojha/data-engineering"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ritesh-ojha%2FData-Engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ritesh-ojha%2FData-Engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ritesh-ojha%2FData-Engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ritesh-ojha%2FData-Engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ritesh-ojha","download_url":"https://codeload.github.com/ritesh-ojha/Data-Engineering/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242276964,"owners_count":20101530,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","apache-kafka","apache-spark","aws-ec2","aws-glue","aws-s3","data-engineering","docker","python"],"created_at":"2024-09-28T07:03:17.083Z","updated_at":"2026-02-10T17:04:40.619Z","avatar_url":"https://github.com/ritesh-ojha.png","language":"Python","readme":"# Data Engineering End to End Projects\n\nThis repository contains all my data engineering projects. In this repo, I will be exploring various data engineering tools and techniques.\n\n![](thumbnail.jpg)\n\n## Projects\n\n1. [OpenWeather](/OpenWeather/)\n   - A data pipeline implemented using Apache Airflow on Amazon Web Services (AWS) for processing OpenWeather data. \n   - The pipeline involves extracting weather data from the OpenWeather API, transforming it, and loading it into a data warehouse for analysis and visualization.\n\n2. [Podcast](/Podcast/)\n   - I created a data pipeline using Airflow on docker. The pipeline will download podcast episodes. \n   - I stored our results in a Postgres database that we can easily query.\n\n3. [Spotify](/Spotify/)\n   - A data  pipeline implemented on Amazon Web Services (AWS) for processing Spotify data. \n   - The pipeline involves loading CSV files containing information about artists, tracks, and albums into an S3 bucket.\n   - performing ETL (Extract, Transform, Load) using AWS Glue, storing the processed data as Parquet files, and finally querying and visualizing the data using Amazon Athena and Power BI.\n\n4. [Smart City](/Smart-City/)\n   - A data engineering project for simulating data generation using Python for Apache Kafka, processing the data with Apache Spark, and storing it in Amazon S3. \n   - All services will be orchestrated and run on Docker containers.\n\n## Contact\n\nIf you have any queries, feel free to reach out to me at riteshojha2002@gmail.com or create issue [here](https://github.com/ritesh-ojha/Data-Engineering/issues/new).\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fritesh-ojha%2Fdata-engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fritesh-ojha%2Fdata-engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fritesh-ojha%2Fdata-engineering/lists"}