https://github.com/arush-codes/paris-olympic-de
data engineering project on paris olympics 2024
https://github.com/arush-codes/paris-olympic-de
azure data-analysis data-engineering microsoft-azure olympics2024 pipeline
Last synced: about 2 months ago
JSON representation
data engineering project on paris olympics 2024
- Host: GitHub
- URL: https://github.com/arush-codes/paris-olympic-de
- Owner: aRUsh-codes
- License: apache-2.0
- Created: 2024-09-06T04:34:17.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-18T17:16:00.000Z (over 1 year ago)
- Last Synced: 2025-02-18T18:25:13.195Z (over 1 year ago)
- Topics: azure, data-analysis, data-engineering, microsoft-azure, olympics2024, pipeline
- Language: Jupyter Notebook
- Homepage: https://arush-codes.github.io/paris-olympic-de/
- Size: 3.64 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Paris Olympics Data Engineering Project in Microsoft Azure 🏅
## Overview 📝
This project demonstrates an end-to-end ETL process for analyzing Olympics data using Microsoft Azure services: Azure Data Factory, Azure Databricks, Azure Storage, and Azure Synapse Analytics. The dataset includes four tables: Athletes, Coaches, Teams, and Medals.
## Components Used ⚙️
1. Azure Data Factory 🏭
Orchestrated the ETL pipeline by extracting data from Azure Storage, transforming it in Databricks, and loading it into Azure Synapse Analytics.
2. Azure Databricks 🔥
Handled data transformation and analysis using PySpark. The cleaned and processed data was prepared for deeper insights, such as athlete performance trends and medal distributions.
3. Azure Storage 📦
Stored the raw Olympics data (CSV files) as the source for the pipeline.
4. Azure Synapse Analytics 📊
Served as the data warehouse, allowing complex SQL queries for team performance and medal analysis.
## Dataset 📚
The dataset includes:
> - **Athletes**
> - **Coaches**
> - **Teams**
> - **Medals**
It can be found on kaggle - "https://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games/data"
## Work 🎯
https://github.com/user-attachments/assets/7320647c-7230-4502-9a48-660f9afd45b1
# My PowerBI Dashboard
This is a link to my live PowerBI dashboard. Click the image below to open it:
[](https://app.powerbi.com/groups/me/reports/5b762f2d-4ce9-4210-814d-535e6fa55003/f358d71266e42344d5f5?experience=power-bi)
[](https://app.powerbi.com/groups/me/reports/5b762f2d-4ce9-4210-814d-535e6fa55003/f358d71266e42344d5f5?experience=power-bi)
[](https://app.powerbi.com/groups/me/reports/5b762f2d-4ce9-4210-814d-535e6fa55003/f358d71266e42344d5f5?experience=power-bi)
---