{"id":13735901,"url":"https://github.com/perrygeo/pyimpute","last_synced_at":"2025-04-09T14:15:40.942Z","repository":{"id":12009146,"uuid":"14589858","full_name":"perrygeo/pyimpute","owner":"perrygeo","description":"Spatial classification and regression using Scikit-learn and Rasterio ","archived":false,"fork":false,"pushed_at":"2023-01-15T20:35:32.000Z","size":1072,"stargazers_count":126,"open_issues_count":6,"forks_count":35,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-02T03:38:14.083Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/perrygeo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-11-21T14:33:12.000Z","updated_at":"2025-03-06T19:37:37.000Z","dependencies_parsed_at":"2023-01-16T19:46:35.376Z","dependency_job_id":null,"html_url":"https://github.com/perrygeo/pyimpute","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrygeo%2Fpyimpute","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrygeo%2Fpyimpute/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrygeo%2Fpyimpute/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrygeo%2Fpyimpute/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/perrygeo","download_url":"https://codeload.github.com/perrygeo/pyimpute/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248054194,"owners_count":21039952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T03:01:12.728Z","updated_at":"2025-04-09T14:15:40.917Z","avatar_url":"https://github.com/perrygeo.png","language":"Python","funding_links":[],"categories":["`Python` processing of optical imagery (non deep learning)","Python"],"sub_categories":["Python libraries related to EO"],"readme":"![travis](https://travis-ci.org/perrygeo/pyimpute.svg)\n\n## Python module for geospatial prediction using scikit-learn and rasterio\n\n`pyimpute` provides high-level python functions for bridging the gap between spatial data formats and machine learning software to facilitate supervised classification and regression on geospatial data. This allows you to create landscape-scale predictions based on sparse observations.\n\nThe observations, known as the **training data**, consists of:\n\n* response variables: what we are trying to predict\n* explanatory variables: variables which explain the spatial patterns of responses\n\nThe **target data** consists of explanatory variables represented by raster datasets. There are no response variables available for the target data; the goal is to *predict* a raster surface of responses. The responses can either be discrete (classification) or continuous (regression).\n\n![example](https://raw.githubusercontent.com/perrygeo/pyimpute/master/example.png)\n\n## Pyimpute Functions\n\n* `load_training_vector`: Load training data where responses are vector data (explanatory variables are always raster)\n* `load_training_raster`: Load training data where responses are raster data\n* `stratified_sample_raster`: Random sampling of raster cells based on discrete classes\n* `evaluate_clf`: Performs cross-validation and prints metrics to help tune your scikit-learn classifiers.\n* `load_targets`: Loads target raster data into data structures required by scikit-learn\n* `impute`: takes target data and your scikit-learn classifier and makes predictions, outputing GeoTiffs\n    \nThese functions don't really provide any ground-breaking new functionality, they merely saves lots of tedious data wrangling that would otherwise bog your analysis down in low-level details. In other words, `pyimpute` provides a high-level python workflow for spatial prediction, making it easier to:\n\n* explore new variables more easily\n* frequently update predictions with new information (e.g. new Landsat imagery as it becomes available)\n* bring the technique to other disciplines and geographies\n\n\n### Basic example\n\nHere's what a `pyimpute` workflow might look like. In this example, we have two explanatory variables as rasters (temperature and precipitation) and a geojson with point observations of habitat suitability for a plant species. Our goal is to predict habitat suitability across the entire region based only on the explanatory variables.\n\n```\nfrom pyimpute import load_training_vector, load_targets, impute, evaluate_clf\nfrom sklearn.ensemble import RandomForestClassifier\n```\n\nLoad some training data\n```\nexplanatory_rasters = ['temperature.tif', 'precipitation.tif']\nresponse_data = 'point_observations.geojson'\n\ntrain_xs, train_y = load_training_vector(response_data,\n                                         explanatory_rasters,\n                                         response_field=\"suitability\")\n```\n\nTrain a scikit-learn classifier\n```\nclf = RandomForestClassifier(n_estimators=10, n_jobs=1)\nclf.fit(train_xs, train_y)\n```\n\nEvalute the classifier using several validation metrics, manually inspecting the output\n```\nevaluate_clf(clf, train_xs, train_y)\n```\n\nLoad target raster data\n```\ntarget_xs, raster_info = load_targets(explanatory_rasters)\n```\n\nMake predictions, outputing geotiffs\n```\nimpute(target_xs, clf, raster_info, outdir='/tmp',\n        linechunk=400, class_prob=True, certainty=True)\n\nassert os.path.exists(\"/tmp/responses.tif\")\nassert os.path.exists(\"/tmp/certainty.tif\")\nassert os.path.exists(\"/tmp/probability_0.tif\")\nassert os.path.exists(\"/tmp/probability_1.tif\")\n```\n\n### Installation\n\nAssuming you have `libgdal` and the scipy system dependencies installed, you can install with pip \n\n```\npip install pyimpute\n```\n\nAlternatively, install from the source code\n```\ngit clone https://github.com/perrygeo/pyimpute.git\ncd pyimpute\npip install -e .\n```\n\nSee the `.travis.yml` file for a working example on Ubuntu systems.\n\n### Other resources\n\nFor an overview, watch my presentation at FOSS4G 2014: \u003ca href=\"http://vimeo.com/106235287\"\u003eSpatial-Temporal Prediction of Climate Change Impacts using pyimpute, scikit-learn and GDAL — Matthew Perry\u003c/a\u003e \n\nAlso, check out [the examples](https://github.com/perrygeo/python-impute/blob/master/examples/) and [the wiki](https://github.com/perrygeo/pyimpute/wiki)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrygeo%2Fpyimpute","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperrygeo%2Fpyimpute","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrygeo%2Fpyimpute/lists"}