https://github.com/vipulbunny/olympics-analysis
A data analysis project exploring the Olympic Games from 1896 to 2016. It includes data cleaning, visualization, and insights on athletes, medals, and countries.
https://github.com/vipulbunny/olympics-analysis
data-science eda machinelearning matplotlib ml olympics pandas python seaborn sports-analytics
Last synced: about 1 month ago
JSON representation
A data analysis project exploring the Olympic Games from 1896 to 2016. It includes data cleaning, visualization, and insights on athletes, medals, and countries.
- Host: GitHub
- URL: https://github.com/vipulbunny/olympics-analysis
- Owner: VIPULbunny
- Created: 2025-03-04T15:19:50.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-04T17:05:57.000Z (over 1 year ago)
- Last Synced: 2025-03-17T11:31:31.247Z (over 1 year ago)
- Topics: data-science, eda, machinelearning, matplotlib, ml, olympics, pandas, python, seaborn, sports-analytics
- Language: Jupyter Notebook
- Homepage:
- Size: 2.46 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🏅 Olympics Data Analysis (1896-2016)

## 📌 Project Overview
The **Olympic Games** have been the pinnacle of international sports since **1896**. This project explores the historical dataset of the Olympics, uncovering trends, athlete performance, and country-wise participation. Through data cleaning, visualization, and analysis, we gain insights into how the Games have evolved over time.
## 📂 Dataset Description
The size of **athlete_events.csv** is more than 20mb so i had provided the link and description in the file name **'Dataset_link.py'**
This analysis uses two primary datasets:
1. **athlete_events.csv** - Contains detailed records of Olympic athletes, including:
- Name, Age, Gender
- Sport, Event, Medal (if won)
- Country (NOC), Year, Season (Summer/Winter)
2. **noc_regions.csv** - Maps National Olympic Committees (NOCs) to country names, helping in regional analysis.
## 🚀 Project Objectives
- Perform **Exploratory Data Analysis (EDA)** to understand athlete trends.
- Visualize country-wise **medal counts** and athlete participation.
- Analyze **gender representation** and its evolution in the Olympics.
- Identify the **most successful athletes and countries** over the years.
## 🔧 Installation
To run this project on your local m achine:
1. **Clone the Repository:**
```sh
git clone https://github.com/VIPULbunny/olympics-analysis.git
```
2. **Navigate to the Project Directory:**
```sh
cd olympics-analysis
```
3. **Install Dependencies:**
```sh
pip install numpy pandas matplotlib seaborn
```
4. **Run the Jupyter Notebook:**
```sh
jupyter notebook
```
## 📊 Exploratory Data Analysis (EDA)
### 🏆 Key Insights from Data
- **Total Athletes Participated:** `{total_athletes}`
- **Total Olympic Games Editions:** `{total_games}`
- **Top 10 Countries by Athlete Count:**
- USA, Germany, UK, France, China, etc.
- **Most Successful Athletes:**
- Michael Phelps, Usain Bolt, etc.
- **Gender Representation Over Time:**
- Increasing female participation in modern Olympics.
### 🔎 Data Cleaning & Preprocessing
- Merged `athlete_events.csv` with `noc_regions.csv` for accurate country mapping.
- Handled **missing values** in age, medal, and region data.
- Converted categorical features (`Sex`, `Medal`) into structured formats for better analysis.
## 📈 Data Visualizations
### 🎖️ Medal Distribution

### 🏅 Top 10 Countries by Medal Count

### 📊 Gender Representation Over the Years

### 🌍 Count occurrences of 'Sex' for each 'Season'

## 📜 License
This project is licensed under the **MIT License**.
## 🤝 Contributing
We welcome contributions! If you’d like to improve the analysis or add new insights:
1. **Fork the repository**.
2. **Create a feature branch**: `git checkout -b feature-branch`
3. **Commit your changes**: `git commit -m "Added new analysis"`
4. **Push to GitHub**: `git push origin feature-branch`
5. **Open a Pull Request** 🚀
## 📬 Contact
For queries or collaborations, reach out via email or open an issue on GitHub.
---
**⭐ If you find this project useful, please give it a star!** 🌟