https://github.com/nabilshadman/python-uk-weather-analytics
An end-to-end data science workflow of UK weather data
https://github.com/nabilshadman/python-uk-weather-analytics
data-engineering data-science data-visualization machine-learning matplotlib numpy pandas scikit-learn
Last synced: 26 days ago
JSON representation
An end-to-end data science workflow of UK weather data
- Host: GitHub
- URL: https://github.com/nabilshadman/python-uk-weather-analytics
- Owner: nabilshadman
- License: mit
- Created: 2021-06-19T02:40:10.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-12-03T14:32:35.000Z (5 months ago)
- Last Synced: 2024-12-03T15:31:43.082Z (5 months ago)
- Topics: data-engineering, data-science, data-visualization, machine-learning, matplotlib, numpy, pandas, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 3.27 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# UK Weather Analytics
[](LICENSE)
[](https://www.python.org/)
[](https://jupyter.org/)## Overview
This [research](https://github.com/nabilshadman/python-uk-weather-analytics/blob/main/report/uk_weather_analytics_report.pdf) project analyzes United Kingdom (UK) weather patterns using machine learning approaches, combining both unsupervised and supervised learning algorithms. Our analysis:
- Identifies distinct regional weather station clusters based on natural variations in weather patterns
- Develops classification models to predict station regions with high accuracy
- Investigates potential correlations between weather conditions and happiness metrics in the UKThe complete analysis workflow is automated and reproducible, with all code and data publicly available.
## Tech Stack
- **Core Language:** Python 3.7+
- **Key Libraries:**
- NumPy: Scientific computing and array operations
- Pandas: Data manipulation and analysis
- Matplotlib: Data visualization
- scikit-learn: Machine learning algorithms
- Jupyter: Interactive notebook environment
- **Scripting:** Bash## Datasets
Our analysis leverages two primary public datasets:
1. **UK Historic Weather Data** (Met Office)
- Comprehensive monthly weather station records
- Geographical distribution shown in Figure 1 below2. **UK Personal Well-being Estimates** (Office for National Statistics)
- Annual Population Survey data (April 2014 - March 2015)
- Includes geographical breakdown
**Figure 1:** Geographic distribution of UK weather stations### Feature Descriptions
| Dataset | Feature | Definition |
|---------|----------|------------|
| Historic Station Data | tmax | Mean daily maximum temperature (°C) |
| | tmin | Mean daily minimum temperature (°C) |
| | af | Days of air frost |
| | rain | Total rainfall (mm) |
| | sun | Total sunshine duration (hours) |
| Personal Well-being | average rating | Mean happiness rating (0-10 scale) |## Development Environment
### Prerequisites
- Anaconda Distribution (Recommended)
- Python 3.7 or higher
- Jupyter Notebook### Setup Instructions
1. Install [Anaconda Distribution](https://www.anaconda.com/download)
2. Launch Anaconda Navigator
3. Start Jupyter Notebook via the "Launch" button
4. Navigate to the project directory## Automated Workflow
Our end-to-end automated pipeline ensures reproducibility and efficient data processing. The workflow consists of five main stages:
### 1. Weather Data Acquisition
```bash
code/test_automation/download_weather_data.ipynb
```
- Downloads station data based on `stations.txt` configuration
- Creates cleaned text files without metadata### 2. Happiness Data Collection
```bash
code/test_automation/download_happiness_data.ipynb
```
- Fetches personal well-being estimates
- Stores data in Excel format### 3. Weather Clustering Analysis
```bash
code/test_automation/perform_clustering_weather_data.ipynb
```
- Processes and cleans station data
- Performs clustering analysis
- Generates visualizations
- Saves intermediate results for subsequent stages### 4. Regional Classification
```bash
code/test_automation/perform_classification_weather_data.ipynb
```
- Builds classification models
- Evaluates regional prediction accuracy
- Produces performance metrics### 5. Weather-Happiness Regression
```bash
code/test_automation/perform_regression_weather_happiness_datasets.ipynb
```
- Combines weather and well-being datasets
- Conducts regression analysis
- Generates statistical summaries
**Figure 2:** Complete data science pipeline and file dependencies## Documentation
For detailed methodology, analysis, and conclusions, please refer to our comprehensive [research report](https://github.com/nabilshadman/python-uk-weather-analytics/blob/main/report/uk_weather_analytics_report.pdf).
## Contributing
We welcome contributions! Please feel free to submit pull requests or open issues for any improvements.
## License
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
## Citation
If you use this work in your research, please cite:
```bibtex
@misc{uk-weather-analytics,
author = {Shadman, Nabil},
title = {UK Weather Analytics},
year = {2021},
publisher = {GitHub},
url = {https://github.com/nabilshadman/python-uk-weather-analytics}
}
```