Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tashi-2004/geospatial-air-quality-analysis-with-apache-spark

This project analyzes air quality data across regions to identify improvement areas, track trends, and classify similar regions using clustering. Leveraging PySpark, it processes sensor data, calculates Air Quality Index (AQI), and visualizes results with histograms and geographic maps to highlight areas with good air quality.
https://github.com/tashi-2004/geospatial-air-quality-analysis-with-apache-spark

aqi aqi-prediction clustering data-science data-visualization geospatial-visualization kmeans-clustering predictive-modeling sensor-data time-series-analysis

Last synced: about 2 months ago
JSON representation

This project analyzes air quality data across regions to identify improvement areas, track trends, and classify similar regions using clustering. Leveraging PySpark, it processes sensor data, calculates Air Quality Index (AQI), and visualizes results with histograms and geographic maps to highlight areas with good air quality.

Awesome Lists containing this project

README

        

# Air-Quality-Improvement-Data-Analysis

This repository contains a data analysis project focused on examining air quality data from various geographical regions. By analyzing this data, we aim to identify areas of improvement in air quality, track air quality trends, and cluster regions with similar air quality patterns.

## Project Overview
The primary goal of this project is to analyze air quality data from sensor readings across different regions, calculate Air Quality Index (AQI), and identify trends in air quality improvements. Using clustering, we categorize geographical regions based on air quality data, and visually represent findings through histograms and geographical mappings.

## Files Included
1. **Datasets:**
- Averaged Data from last 24 hours for each sensor: [Visit Here](https://data.sensor.community/static/v2/data.24h.json)
- Averaged Data from last 5 minutes for each sensor (for testing): [Visit Here](https://data.sensor.community/static/v2/data.json)

2. **Reports:**
- `Report.pdf`: A comprehensive report detailing the analysis, visualizations, and insights.

3. **Code:**
- `code.py`: This script performs the entire analysis, from data ingestion and preprocessing to visualization and reporting.
4. **Shapefiles:**
- `ne_110m_admin_0_countries.shp`: The geometric data for the countries.
- `ne_110m_admin_0_countries.shx`: The index of the geometric data.
- `ne_110m_admin_0_countries.dbf`: Attribute data related to the countries (such as names and codes).
- `ne_110m_admin_0_countries.prj`: The coordinate reference system for the shapefile.
- `ne_110m_admin_0_countries.cpg`: Character encoding information.

## Features
1. **Data Acquisition**: Fetches 24-hour air quality data and averaged data from the last 5 minutes for each sensor.
2. **Data Cleaning and Transformation**: Prepares data using PySpark, ensuring correct formats and handling of missing values.
3. **Air Quality Index (AQI) Calculation**: Computes AQI based on sensor data and classifies regions accordingly.
4. **Trend Analysis**: Calculates daily AQI and compares trends to highlight improvements in air quality.
5. **K-Means Clustering**: Groups regions into clusters based on geographical coordinates.
6. **Visualizations**:
- Histogram of longest streaks of good air quality.
3
- Geographical map displaying air quality data points.
4
7. **Top Countries and Regions**:
- Lists top 10 countries with the best air quality.
1
- Lists top 50 regions with the best air quality.
2a
2b

## Technologies Used
- **Python**: Core programming language for data processing and analysis.
- **PySpark**: For large-scale data processing and analysis.
- **Geopandas**: For handling geographical data.
- **Matplotlib & Seaborn**: For visualizations.
- **Requests**: For fetching data from APIs.
- **KMeans Clustering**: For geographical clustering of regions.
## Note
- Whenever you run this project, please ensure that the Shapefiles are kept in the same directory as the `code.py` file for proper execution of geographical visualizations.

## Contributors
- Tashfeen Abbasi
- [Laiba Mazhar](https://github.com/laiba-mazhar)

## Contact
For any questions or suggestions, feel free to contact at [[email protected]]