Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pipe199x/tokyo-azure-spark

Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.
https://github.com/pipe199x/tokyo-azure-spark

apache-spark azure data-science python

Last synced: 1 day ago
JSON representation

Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.

Host: GitHub
URL: https://github.com/pipe199x/tokyo-azure-spark
Owner: Pipe199x
Created: 2024-06-08T12:13:34.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-06-08T16:32:18.000Z (5 months ago)
Last Synced: 2024-10-19T03:15:42.300Z (about 1 month ago)
Topics: apache-spark, azure, data-science, python
Language: Jupyter Notebook
Homepage:
Size: 146 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Tokyo-Azure-Spark

This project uses Azure, Apache Spark, and Python to process and analyze Olympic data. Below are the main components and technologies used in the project:

## Technologies Used
- **Azure**: Microsoft's cloud platform used for data storage and processing.
- **Apache Spark**: A unified analytics engine for processing large volumes of data.
- **Python**: The programming language used to write data processing and analysis scripts.

## Project Description
The goal of this project is to process and analyze Olympic data to extract valuable information about athletes, their coaches, teams, and events. The data is stored in Azure and processed using Apache Spark to efficiently handle large volumes of data.

### Project Structure
- **CSVs/**: Contains all the CSV files with Olympic data.
- `Athletes.csv`
- `Coaches.csv`
- `EntriesGender.csv`
- `Medals.csv`
- `Teams.csv`
- **tokyo_olympic.ipynb**: The Jupyter notebook containing the data processing and analysis scripts.