https://github.com/landscapegeoinformatics/est_water_qual
Scripts used in the Estonian water quality modeling project
https://github.com/landscapegeoinformatics/est_water_qual
hydrologic-modeling hydrology random-forest-regression water-quality
Last synced: 8 months ago
JSON representation
Scripts used in the Estonian water quality modeling project
- Host: GitHub
- URL: https://github.com/landscapegeoinformatics/est_water_qual
- Owner: LandscapeGeoinformatics
- License: mit
- Created: 2021-11-12T07:26:43.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-01-27T11:26:28.000Z (over 3 years ago)
- Last Synced: 2024-06-05T19:27:45.550Z (over 1 year ago)
- Topics: hydrologic-modeling, hydrology, random-forest-regression, water-quality
- Language: Jupyter Notebook
- Homepage:
- Size: 422 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# est_water_qual
Scripts and notebooks used for modeling nutrient concentrations in Estonian streams.
The **scripts** are divided into three folders. Folder **preprocessing** contains scripts used for preprocessing the predictor variables. Folder **stats_calc** contains scripts used for extracting statistics from the predictor variables that were used as features in the model. Folder **model** contains scripts related to the random forest (RF) model used for predicting total nitrogen (TN) and total phosphorus (TP) in streams.
**preprocessing** contains the following scripts:
* *preprocess_catchments.py* for preprocessing the catchments originating from the Estonian Nature Information System (EELIS)
* *preprocess_\*.py* for preprocessing source data (e.g. *preprocess_dem.py*) used as predictors in the model
* *preprocess_wq_obs.py* for preprocessing water quality data from the Estonian environment monitoring system (KESE)
* *utils.py* containing helper functions for loading and subsetting source data from corresponding raster and vector files**stats_calc** contains the following scripts:
* *calc_area.py* for calculating catchment area
* *calc_\*_prop.py* for calculating proportion statistics for limestone, land use and land cover (LULC) and pollution sensitivity and vegetation in riparian buffers
* *calc_\*_prop_buff.py* for calculating proportion statistics in stream buffers
* *calc_livestock_density.py* for calculating livestock density within catchments
* *calc_manure_dep.py* for calculating nitrogen and phosphorus deposition in manure
* *calc_stats.py* as a generic script for extracting zonal statistics for climate, soil and topographic variables
* *calc_stream_density.py* for calculating stream density in catchments as the total length of streams divided by catchment area
* *concat_stats.py* for concatenating the derived statistics for each predictor
* *utils.py* containing helper functions for extracting statistics from predictor variables**model** contains the following scripts:
* *correlate_features.py* for calculating correlations between features used as predictors
* *extract_site_catchments.py* for extracting the catchments of observation sites used in the model
* *group_features.py* for grouping features based on their subcode
* *plot_obs_vs_pred.py* for creating plots comaring the observed water quality data to their corresponding predictions
* *plot_shap.py* for creating plots based on SHAP values
* *predict.py* for validating the model
* *prepare_ml_input.py* for merging the predictor variables with the water quality observations
* *results_to_gpkg.py* for adding the modeling results to catchments for plotting purposes
* *select_features.py* for creating four feature sets for both nutrients based on feature correlations
* *train_model.py* for building the model for a particular feature set based on hyperparameter optimization
* *utils.py* containing helper functions used for building the modelMost of the Python scripts also have corresponding shell scripts that were used for submitting Slurm jobs to the HPC cluster of University of Tartu.
The **notebooks** folder contains the following Jupyter notebooks used for analyzing the water quality data:
* *explore_wq_data.ipynb* for statistics and plots about the water quality observations
* *format_results.ipynb* for collating the plots used in the paper