{"id":13857791,"url":"https://github.com/Envirometrix/landmap","last_synced_at":"2025-07-13T22:31:12.005Z","repository":{"id":56934531,"uuid":"188481733","full_name":"Envirometrix/landmap","owner":"Envirometrix","description":"Landmap package for R","archived":false,"fork":false,"pushed_at":"2022-06-16T09:11:46.000Z","size":1464,"stargazers_count":47,"open_issues_count":7,"forks_count":13,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-11-22T15:41:21.438Z","etag":null,"topics":["global","landgis","learning","machine","rstats"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Envirometrix.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-24T20:11:10.000Z","updated_at":"2024-10-16T11:35:37.000Z","dependencies_parsed_at":"2022-08-21T05:20:39.489Z","dependency_job_id":null,"html_url":"https://github.com/Envirometrix/landmap","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/Envirometrix/landmap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Envirometrix%2Flandmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Envirometrix%2Flandmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Envirometrix%2Flandmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Envirometrix%2Flandmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Envirometrix","download_url":"https://codeload.github.com/Envirometrix/landmap/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Envirometrix%2Flandmap/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265218201,"owners_count":23729496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["global","landgis","learning","machine","rstats"],"created_at":"2024-08-05T03:01:47.099Z","updated_at":"2025-07-13T22:31:11.658Z","avatar_url":"https://github.com/Envirometrix.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"# landmap package for R\n\n[![Build Status](https://travis-ci.org/Envirometrix/landmap.svg?branch=master)](https://travis-ci.org/Envirometrix/landmap)\n[![R-CMD-check](https://github.com/Envirometrix/landmap/workflows/R-CMD-check/badge.svg)](https://github.com/Envirometrix/landmap/actions)\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/landmap)](https://cran.r-project.org/package=landmap)\n[![Github_Status_Badge](https://img.shields.io/badge/Github-0.0--8-blue.svg)](https://github.com/Envirometrix/landmap)\n\nPackage provides methodology for automated mapping i.e. spatial interpolation and/or \nprediction using **Ensemble Machine Learning** (extends functionality of the [mlr package](https://mlr.mlr-org.com/)). Key functionality includes:\n\n* `train.spLearner` --- train a spatial prediction and/or interpolation model using Ensemble Machine Learning (works with numeric, binomial and factor-type variables),\n* `buffer.dist` --- derive buffer (geographical) distances that can be used as covariates in spLearner, \n* `spc` --- derive Principal Components using stack of spatial layers,\n* `tile` --- tile spatial layers so they can be used to run processing in parallel,\n* `spsample.prob` --- determine inclusion probability / representation of a given point sample based on feature space analysis (maxlike function) and kernel density analysis,\n* `download.landgis` --- access and download LandGIS layers from www.openlandmap.org,\n\nWarning: most of functions are optimized to run in parallel by default. This might result in high RAM and CPU usage.\n\nSpatial prediction using [Ensemble Machine Learning](https://koalaverse.github.io/machine-learning-in-R/stacking.html#stacking-software-in-r) with geographical distances \nis explained in detail in:\n\n- Hengl, T., MacMillan, R.A., (2019). \n   [Predictive Soil Mapping with R](https://soilmapper.org/soilmapping-using-mla.html). \n   OpenGeoHub foundation, Wageningen, the Netherlands, 370 pages, www.soilmapper.org, \n   ISBN: 978-0-359-30635-0.\n- Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., and Gräler, B. (2018). \n   [Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables](https://doi.org/10.7717/peerj.5518). PeerJ 6:e5518.\n\nUse of geographical distances and nearest neighbors as features in machine learning is also explained in detail in:\n\n- Møller, A. B., Beucher, A. M., Pouladi, N., and Greve, M. H. (2020). [Oblique geographic coordinates as covariates for digital soil mapping](https://doi.org/10.5194/soil-6-269-2020). SOIL, 6, 269–289, https://doi.org/10.5194/soil-6-269-2020\n- Sekulić, A., Kilibarda, M., Heuvelink, G.B., Nikolić, M., Bajat, B. (2020). [Random Forest Spatial Interpolation](https://doi.org/10.3390/rs12101687). Remote Sens. 12, 1687. https://doi.org/10.3390/rs12101687\n\nA detailed tutorial on how to use landmap package to generate predictions / interpolated point data sets is available **[here](https://gitlab.com/openlandmap/spatial-predictions-using-eml)**.\n\n## Installing\n\nInstall development versions from github:\n\n```r\nlibrary(devtools)\ninstall_github(\"envirometrix/landmap\")\n```\n\nNote: functions not recommended for large datasets.\n\n## Functionality\n\n### Automated mapping using Ensemble Machine Learning\n\nFirst, we need to install number of packages as Ensemble Machine Learning uses \nseveral independent learners:\n\n```r\nls \u003c- c(\"rgdal\", \"raster\", \"plotKML\", \"geoR\", \"ranger\", \"mlr\", \"forestError\", \n        \"xgboost\", \"glmnet\", \"matrixStats\", \"kernlab\", \"deepnet\")\nnew.packages \u003c- ls[!(ls %in% installed.packages()[,\"Package\"])]\nif(length(new.packages)) install.packages(new.packages)\nlibrary(landmap)\nlibrary(rgdal)\nlibrary(geoR)\nlibrary(plotKML)\nlibrary(raster)\nlibrary(glmnet)\nlibrary(xgboost)\nlibrary(kernlab)\nlibrary(deepnet)\nlibrary(mlr)\n```\n\nThe following examples demostrates spatial prediction using the meuse data set.\nNote that we only have to specify the target point data set, covariate layers (object of class `SpatialPixelsDataFrame`) \nand that the target variable needs a transformation `lambda = 1`, which is only required \nfor the fitting of variogram using the geoR package:\n\n```r\ndemo(meuse, echo=FALSE)\nm \u003c- train.spLearner(meuse[\"zinc\"], covariates=meuse.grid[,c(\"dist\",\"ffreq\")], lambda = 1)\n```\n\nthis runs several steps:\n\n```\nConverting ffreq to indicators...\nConverting covariates to principal components...\nDeriving oblique coordinates...TRUE\nFitting a variogram using 'linkfit' and trend model...TRUE\nEstimating block size ID for spatial Cross Validation...TRUE\nStarting parallelization in mode=socket with cpus=32.\nUsing learners: regr.ranger, regr.xgboost, regr.nnet, regr.ksvm, regr.cvglmnet...TRUE\nFitting a spatial learner using 'mlr::makeRegrTask'...TRUE\nExporting objects to slaves for mode socket: .mlr.slave.options\nMapping in parallel: mode = socket; cpus = 32; elements = 5.\nExporting objects to slaves for mode socket: .mlr.slave.options\nMapping in parallel: mode = socket; cpus = 32; elements = 5.\nExporting objects to slaves for mode socket: .mlr.slave.options\nMapping in parallel: mode = socket; cpus = 32; elements = 5.\n# weights:  103\ninitial  value 54927206.667240 \nfinal  value 20750447.509677 \nconverged\nFitting a quantreg model using 'ranger::ranger'...TRUE\nExporting objects to slaves for mode socket: .mlr.slave.options\nMapping in parallel: mode = socket; cpus = 32; elements = 5.\nStopped parallelization. All cleaned up.\n```\n\nIn the landmap framework, variogram model is only fitted to estimate effective range of spatial dependence, \nwhich is then used to determine the size of blocks for spatial block Cross-Validation.\nSpatial Prediction models are based only on fitting the [Ensemble Machine Learning](https://koalaverse.github.io/machine-learning-in-R/stacking.html#stacking-software-in-r) \n(by default landmap uses `c(\"regr.ranger\", \"regr.xgboost\", \"regr.ksvm\", \"regr.nnet\", \"regr.cvglmnet\")`; see [a complete list of learners available via mlr](https://mlr.mlr-org.com/articles/tutorial/integrated_learners.html)) \nwith oblique coordinates (rotated coordinates) as described in [Moller et al. (2019) \n\"Oblique Coordinates as Covariates for Digital Soil Mapping\"](https://www.soil-discuss.net/soil-2019-83/) to account for spatial auto-correlation in \nvalues. In the landmap package, geographical distances to ALL points can be added \nby specifying `buffer.dist=TRUE`; this is however not recommended for large point data sets.\nThe meta-learning i.e. the `SuperLearner` model shows which individual learners are most important:\n\n```r\nsummary(m@spModel$learner.model$super.model$learner.model)\n```\n\n```\nCall:\nstats::lm(formula = f, data = d)\n\nResiduals:\n    Min      1Q  Median      3Q     Max \n-478.73 -107.15  -30.85   67.52 1201.31 \n\nCoefficients:\n                Estimate Std. Error t value Pr(\u003e|t|)    \n(Intercept)   1358.41882  622.23470   2.183 0.030592 *  \nregr.ranger      0.74741    0.20159   3.708 0.000295 ***\nregr.xgboost     0.08317    0.41544   0.200 0.841606    \nregr.nnet       -2.89762    1.33319  -2.173 0.031326 *  \nregr.ksvm        0.41938    0.22837   1.836 0.068283 .  \nregr.cvglmnet   -0.14177    0.19576  -0.724 0.470071    \n---\nSignif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1\n\nResidual standard error: 217.1 on 149 degrees of freedom\nMultiple R-squared:  0.6616,\tAdjusted R-squared:  0.6503 \nF-statistic: 58.27 on 5 and 149 DF,  p-value: \u003c 2.2e-16\n```\n\nin this case `regr.ranger` seems to be most important for predicting zinc concentration (highest absolute t value), \nwhile `regr.ksvm` and `regr.cvglmnet` are the least important. Overall, this ensemble model explains ca 65% of variance (based on repeated 5-fold cross-validation).\n\nNext we can generate predictions by using:\n\n```r\nmeuse.zinc \u003c- predict(m)\n```\n\n```r\nPredicting values using 'getStackedBaseLearnerPredictions'...TRUE\nDeriving model errors using ranger package 'quantreg' option...TRUE\n```\n\nNote that, based on the current set-up with `method = \"stack.cv\"`, so every time we re-run the model training we \nmight get somewhat different models / different betas. On the other hand, the final ensemble predictions (map) should visually not differ too much (see below). In practice, as the number of training points and features increases, \nthe predictions should not differ significantly.\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/meuse/Fig_meuse_EML.png\" width=\"650\"\u003e\\\n_Figure: Predicted zinc content for the Meuse data set. Model error is derived using quantile regression from multiple model predictions._\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/meuse/Fig_meuse_EML_2.png\" width=\"650\"\u003e\\\n_Figure: Repeated predictions for zinc content using the same settings._\n\nAs a default setting, we use the method of [Lu \u0026 Hardin (2021)](http://jmlr.org/papers/v22/18-558.html) implemented in the \n[forestError](https://cran.r-project.org/package=forestError) package to derive the prediction intervals i.e. the estimated \nuncertainty around a single predicted value. It can be derived as:\n\n- upper and lower quantiles, and/or\n- standard deviation (assumes symmetric distribution of errors),\n\nAs a default value for prediction intervals, landmap uses `quantiles = c((1-.682)/2, 1-(1-.682)/2)` so that s.d. can also \nbe derived from the upper and lower 68% quantiles by using:\n\n```r\npred.error \u003c- (q.upr-q.lwr)/2\n```\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/meuse/map-zinc-interval-1.png\" width=\"650\"\u003e\\\n_Figure: Lower and upper prediction intervals based on the 68% probability._\n\n\nAnimated predictions by 9 models (3x independently fitted random forest, SVM and Xgboost) shows the following patterns:\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/meuse/meuse_lead_ensemble.gif\" width=\"400\" /\u003e\n_Figure: Examples of independently generated predictions for lead concentration. The coefficients are beta coefficients from the meta-learner fit: the higher the coefficient, more important the model for the ensemble merge._\n\n\nThe predictions shown in the image above incorporate spatial correlation between values, \nand hence can be used as a possible replacement for kriging methods ([Hengl et al. 2018](https://doi.org/10.7717/peerj.5518)). Automation comes, however, at the high computing and RAM usage costs.\n\nIn the following example we use somewhat larger data set from the SIC1997 exercise.\n\n```r\ndata(\"sic1997\")\nX \u003c- sic1997$swiss1km[c(\"CHELSA_rainfall\",\"DEM\")]\nmR \u003c- train.spLearner(sic1997$daily.rainfall, covariates=X, lambda=1)\nrainfall1km \u003c- predict(mR)\n```\n\nThe processing is now much more computational because the data set consists from 467 points and size of grid / features is higher.\nThis will make the regression matrix becoming extensive, and also 5x5 models need to be fitted.\nAt the moment, using `train.spLearner` for point data set with \u003e\u003e1000 points should be done with caution.\n\nThe final results also shows quite similar results to universal kriging in [geoR](http://leg.ufpr.br/~paulojus/geoR/). The model error map above, however, shows more spatial contrast and helps detect areas of especially high errors.\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/rainfall/Fig_SIC1997_EML.png\" width=\"900\"\u003e\\\n_Figure: Predicted daily rainfall for the SIC1997 data set._\n\n\nThe same function can also be used to interpolate factor-type variables:\n\n```r\nlibrary(plotKML)\ndata(eberg_grid)\ngridded(eberg_grid) \u003c- ~x+y\nproj4string(eberg_grid) \u003c- CRS(\"+init=epsg:31467\")\ndata(eberg)\ncoordinates(eberg) \u003c- ~X+Y\nproj4string(eberg) \u003c- CRS(\"+init=epsg:31467\")\nX \u003c- eberg_grid[c(\"PRMGEO6\",\"DEMSRT6\",\"TWISRT6\",\"TIRAST6\")]\nmF \u003c- train.spLearner(eberg[\"TAXGRSC\"], covariates=X)\nTAXGRSC \u003c- predict(mF)\nplot(stack(TAXGRSC$pred[grep(\"prob.\", names(TAXGRSC$pred))]), \n     col=SAGA_pal[[\"SG_COLORS_YELLOW_RED\"]], zlim=c(0,1))\n```\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/eberg/predicted_classes_eberg.png\" width=\"900\"\u003e\\\n_Figure: Predicted Ebergotzen soil types (probabilities)._\n\nFor each class we can also derive a standard deviation of predicted probabilities by multiple \nindependently fitted learners. This shows where the model is in average most uncertain per class. \nIn the landmap package we in general recommend that [log-loss measure](https://www.r-bloggers.com/2015/12/making-sense-of-logarithmic-loss/) is used to evaluate mapping accuracy, \nso that one can see which classes and where are the most problematic.\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/eberg/predicted_classes_eberg_errors.png\" width=\"900\"\u003e\\\n_Figure: Predicted errors of the Ebergotzen soil types (probabilities)._\n\n\nNote that in the case of factor variables, prediction are based on ensemble stacking\nbased on the following three classification algorithms `c(\"regr.ranger\", \"regr.xgboost\", \"regr.nnet\")`. See mlr documentation on how to add additional [learners](https://mlr.mlr-org.com/articles/tutorial/integrated_learners.html).\n\nIn summary: package mlr provides a comprehensive environment for Machine Learning:\n\n- Ensemble predictions are based on the `mlr::makeStackedLearner` function,\n- Additional [learners](https://mlr.mlr-org.com/articles/tutorial/integrated_learners.html) can be added,\n- Processing can be parallelized using the [parallelMap package](https://mlr.mlr-org.com/articles/tutorial/parallelization.html),\n\nEnsemble Machine Learning is also available via the [subsemble](https://github.com/ledell/subsemble) and the [SuperLearner](https://github.com/ecpolley/SuperLearner) packages (not used here). For more info about Ensemble Machine Learning refer to this **[tutorial](https://gitlab.com/openlandmap/spatial-predictions-using-eml)**.\n\n### Accessing LandGIS layers\n\nLandmap package also provides functionality to access and download LandGIS layers\nfrom www.openlandmap.org. Recommend process is to first search the coverage ID \nnames and file names e.g.:\n\n```r\nsearch.landgis(pattern=c(\"clay\", \"10..10cm\"))\n```\n\nThis shows that a clay map at 10 cm depth of the world is available via:\n\n```\n[[1]]\n                                                   predicted250m.file \n\"sol_clay.wfraction_usda.3a1a1a_m_250m_b10..10cm_1950..2017_v0.2.tif\" \n\n[[2]]\n[1] \"https://www.zenodo.org/api/files/d95e82f3-203d-4ae5-86dc-b1ddd65ff8b2/sol_clay.wfraction_usda.3a1a1a_m_250m_b10..10cm_1950..2017_v0.2.tif\" \n[2] \"https://www.zenodo.org/api/files/d95e82f3-203d-4ae5-86dc-b1ddd65ff8b2/sol_clay.wfraction_usda.3a1a1a_md_250m_b10..10cm_1950..2017_v0.2.tif\"\n```\n\nWeb Coverage Service functionality and zenodo.org API are explained in detail [here](https://github.com/Envirometrix/LandGISmaps#accessing-data).\nNext we can download only clay map for Switzerland using the Web Coverage Service \nfunctionality of LandGIS:\n\n```r\ncoverageId = \"predicted250m:sol_clay.wfraction_usda.3a1a1a_m_250m_b10..10cm_1950..2017_v0.2\"\nswiss1km.ll \u003c- raster::projectRaster(raster(sic1997$swiss1km[2]), crs = \"+init=epsg:4326\", res=c(1/120, 1/120))\ns1 = paste0(\"Lat(\", swiss1km.ll@extent@ymin, \",\", swiss1km.ll@extent@ymax,\")\")\ns2 = paste0(\"Long(\", swiss1km.ll@extent@xmin, \",\", swiss1km.ll@extent@xmax,\")\")\nsf = (1/480)/(1/120)\ndownload.landgis(coverageId, filename = \"clay_ch1km.tif\", subset = c(s1,s2), scalefactor = sf)\nswiss1km.ll1km \u003c- as(swiss1km.ll, \"SpatialGridDataFrame\")\nswiss1km.ll1km$clay_10..10cm \u003c- readGDAL(\"clay_ch1km.tif\")$band1\nswiss1km.ll1km$clay_10..10cm \u003c- ifelse(is.na(swiss1km.ll1km$DEM), NA, swiss1km.ll1km$clay_10..10cm)\nmapview(swiss1km.ll1km[\"clay_10..10cm\"])\n```\n\n\u003cimg src=\"https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/results/rainfall/Fig_download_LandGIS_swiss1km.jpg\" width=\"650\"\u003e\\\n_Figure: Clay content map for Switzerland._\n\nThis takes few steps because you have to determine:\n\n* bounding box,\n* scaling factor,\n* mask out pixels of interest,\n\nFor smaller areas (\u003c500Mb in size) download of data using WCS is fast and efficient.\nFor accessing and using global layers larger than 1GB we recommend directly downloading data from [zenodo.org](https://zenodo.org/search?page=1\u0026size=20\u0026q=LandGIS).\n\n## Contributions\n\n* Contributions to landmap are welcome. Issues and pull requests are the preferred ways of sharing them.\n* We are interested in your results and experiences with using the `train.spLearner` function \n  for generating spatial predictions with your own data. Share your data sets, \n  code and results either using github issues and/or R-sig-geo mailing list.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEnvirometrix%2Flandmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEnvirometrix%2Flandmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEnvirometrix%2Flandmap/lists"}