Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pipe199x/tokyo-azure-spark
Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.
https://github.com/pipe199x/tokyo-azure-spark
apache-spark azure data-science python
Last synced: 1 day ago
JSON representation
Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.
- Host: GitHub
- URL: https://github.com/pipe199x/tokyo-azure-spark
- Owner: Pipe199x
- Created: 2024-06-08T12:13:34.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-06-08T16:32:18.000Z (5 months ago)
- Last Synced: 2024-10-19T03:15:42.300Z (about 1 month ago)
- Topics: apache-spark, azure, data-science, python
- Language: Jupyter Notebook
- Homepage:
- Size: 146 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tokyo-Azure-Spark
This project uses Azure, Apache Spark, and Python to process and analyze Olympic data. Below are the main components and technologies used in the project:
## Technologies Used
- **Azure**: Microsoft's cloud platform used for data storage and processing.
- **Apache Spark**: A unified analytics engine for processing large volumes of data.
- **Python**: The programming language used to write data processing and analysis scripts.## Project Description
The goal of this project is to process and analyze Olympic data to extract valuable information about athletes, their coaches, teams, and events. The data is stored in Azure and processed using Apache Spark to efficiently handle large volumes of data.### Project Structure
- **CSVs/**: Contains all the CSV files with Olympic data.
- `Athletes.csv`
- `Coaches.csv`
- `EntriesGender.csv`
- `Medals.csv`
- `Teams.csv`
- **tokyo_olympic.ipynb**: The Jupyter notebook containing the data processing and analysis scripts.