https://github.com/k-l-16/marathon-data-analysis
cleaning, enriching, and analyzing marathon race data
https://github.com/k-l-16/marathon-data-analysis
geopy pandas python sql
Last synced: about 2 months ago
JSON representation
cleaning, enriching, and analyzing marathon race data
- Host: GitHub
- URL: https://github.com/k-l-16/marathon-data-analysis
- Owner: K-L-16
- License: mit
- Created: 2025-06-14T03:27:36.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-03T00:54:54.000Z (12 months ago)
- Last Synced: 2025-07-03T01:32:26.338Z (12 months ago)
- Topics: geopy, pandas, python, sql
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Marathon-Data-Analysis
cleaning, enriching, and analyzing marathon race data
---
- **Python**
- `pandas` (data cleaning and transformation)
- `geopy` (latitude and longitude retrieval based on city/state)
- **SQL**
- Aggregation queries
- Window functions (RANK)
- CTEs (Common Table Expressions)
- **Tools**
- DataGrip (SQL query management)
- GitHub (version control)
---
## Key Features
- Cleaned raw marathon race data (removed missing values, combined names, computed total minutes).
- Enriched data with geographic coordinates (latitude, longitude).
- Saved processed data to CSV for further analysis.
- Wrote SQL queries for:
- Counting distinct states
- Calculating average race times by gender
- Finding age range by gender
- Grouping average times by age bucket
- Ranking top 3 finishers per gender
---