Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/santhin/air-pollution
Preprocessing air pollution data using ditrubuted computing and ML
https://github.com/santhin/air-pollution
air-pollution coiled dask ml optuna pandas plotly python xgboost
Last synced: about 1 month ago
JSON representation
Preprocessing air pollution data using ditrubuted computing and ML
- Host: GitHub
- URL: https://github.com/santhin/air-pollution
- Owner: Santhin
- License: mit
- Created: 2021-05-01T10:35:17.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-05-02T18:06:59.000Z (over 3 years ago)
- Last Synced: 2024-01-27T10:06:24.959Z (11 months ago)
- Topics: air-pollution, coiled, dask, ml, optuna, pandas, plotly, python, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 1.24 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
The project was created for academic purposes. Consists of unification archive measurement data from range 2000-2019 with existing database.
Measurement data were combined with meteorological data using Dask and distributed computing provided by Coiled
The model was trained using Xgboost and Optuna for hyperparameter tunning, which reached an RMSE of 6,932 [ยตg / m3].### Links to data:
https://powietrze.gios.gov.pl/pjp/archives
https://powietrze.gios.gov.pl/pjp/content/api
https://danepubliczne.imgw.pl/data/## ๐ Installing
- Python 3.8.3
```
git clone https://github.com/Santhin/air-pollution.git
```Installing dependencies:
```
pip install -r requirements.txt
or
poetry install
```
Run jupyter notebook with:
```
jupyter notebook
```
To install coiled software environment:
```
import coiledcoiled.create_software_environment(
name="my-software-env",
conda="coiled-environment-py38.yml",
)
```### Project structure
```
โโโ coiled-environment-py38.yml
โโโ data
โ โโโ dictionaries
โ โ โโโ IndeksJakosciPowietrza.csv
โ โ โโโ Indeks\ jako\305\233ci\ powietrza\ gio\305\233.xlsx
โ โ โโโ IndeksJakosciPowietrza.xlsx
โ โ โโโ Kody_stacji_pomiarowych.xlsx
โ โ โโโ Matching_stations
โ โ โ โโโ SmogoliczkaStacje.csv
โ โ โ โโโ SynopStacje.csv
โ โ โโโ Metadane\ -\ stacje\ i\ stanowiska\ pomiarowe.xlsx
โ โ โโโ Normy.pkl
โ โ โโโ PomiarySample.pkl
โ โ โโโ response_api_gios.json
โ โ โโโ RodzajeParametrow.csv
โ โ โโโ rodzaje_parametrow.pkl
โ โ โโโ SensoryPomiarowe.csv
โ โ โโโ SensoryPomiarowe.pkl
โ โ โโโ stacje_pom_api.json
โ โ โโโ StacjePomiarowe.xlsx
โ โ โโโ stacjeSmogoliczka.csv
โ โโโ IndeksJakosciPowietrza.csv
โ โโโ train_data.csv
โโโ LICENSE
โโโ notebooks
โ โโโ Air\ Quality\ Index\ Gios.ipynb
โ โโโ eda\ without\ progress-Copy4.ipynb
โ โโโ Filtering\ excel\ files\ and\ picking\ right\ parameters.ipynb
โ โโโ Fixing\ missing\ lat\ and\ lon\ in\ stations\ .ipynb
โ โโโ loader_sql.py
โ โโโ Matching\ stations\ synop\ with\ Smogoliczka\ .ipynb
โ โโโ Matching\ synop\ data\ with\ smogoliczka.ipynb
โ โโโ Matching\ Synop\ with\ Smogoliczka\ final.ipynb
โ โโโ ML\ PM2.5.ipynb
โ โโโ New\ Strategy\ script\ for\ excel\ files.ipynb
โ โโโ __pycache__
โ โ โโโ loader_sql.cpython-38.pyc
โ โโโ Repairing\ stations\ names\ and\ merging\ into\ one\ .ipynb
โ โโโ Smogoliczka\ API\ to\ pomiary_pivot.ipynb
โโโ poetry.lock
โโโ pyproject.toml
โโโ README.md
โโโ requirements.txt
```- [MsSQL](https://www.microsoft.com/pl-pl/sql-server/sql-server-downloads) - Database
- [S3Bucket](https://aws.amazon.com/s3/) - Cloud storage
- [Coiled](https://coiled.io/) - Distributed computing
- [Dask](https://dask.org/) - Preprocessing data
- [Optuna](https://optuna.org/) - Hyperparameter optimization
- [Xgboost](https://xgboost.readthedocs.io/en/latest/) - ML