{"id":13436745,"url":"https://github.com/datasciencecampus/green-spaces","last_synced_at":"2025-03-18T21:31:01.386Z","repository":{"id":51143065,"uuid":"200809134","full_name":"datasciencecampus/green-spaces","owner":"datasciencecampus","description":"Render GeoJSON polygons over aerial imagery and analyse pixels covered by vegetation; used to calculate green spaces in residential gardens","archived":false,"fork":false,"pushed_at":"2021-05-21T15:21:58.000Z","size":3891,"stargazers_count":13,"open_issues_count":0,"forks_count":10,"subscribers_count":3,"default_branch":"develop","last_synced_at":"2024-04-16T02:05:58.650Z","etag":null,"topics":["dsc-projects","gardens","green-spaces","hsv","imagery","lab","ndvi","neural-network","nn","vari","vegetation-indices"],"latest_commit_sha":null,"homepage":"https://datasciencecampus.github.io/green-spaces","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datasciencecampus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-06T08:26:37.000Z","updated_at":"2023-12-31T03:05:04.000Z","dependencies_parsed_at":"2022-09-01T19:03:24.299Z","dependency_job_id":null,"html_url":"https://github.com/datasciencecampus/green-spaces","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datasciencecampus%2Fgreen-spaces","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datasciencecampus%2Fgreen-spaces/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datasciencecampus%2Fgreen-spaces/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datasciencecampus%2Fgreen-spaces/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datasciencecampus","download_url":"https://codeload.github.com/datasciencecampus/green-spaces/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244310498,"owners_count":20432549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dsc-projects","gardens","green-spaces","hsv","imagery","lab","ndvi","neural-network","nn","vari","vegetation-indices"],"created_at":"2024-07-31T03:00:51.793Z","updated_at":"2025-03-18T21:30:58.083Z","avatar_url":"https://github.com/datasciencecampus.png","language":"Python","funding_links":[],"categories":["Tools \u0026 Tutorials"],"sub_categories":["green spaces"],"readme":"[![Build Status](https://travis-ci.com/datasciencecampus/Green_Spaces.svg?branch=develop)](https://travis-ci.com/datasciencecampus/Green_Spaces)\n[![codecov](https://codecov.io/gh/datasciencecampus/Green_Spaces/branch/develop/graph/badge.svg)](https://codecov.io/gh/datasciencecampus/Green_Spaces)\n\n\u003cp align=\"center\"\u003e\u003cimg align=\"center\" src=\"green-spaces-logo.png\" width=\"400px\"\u003e\u003c/p\u003e\n\n# Green Spaces\n\nThe Green Spaces project is a tool that can render GeoJSON polygons over aerial imagery and analyse pixels contained within the polygons.\nIts primary use case is to determine the vegetation coverage of residential gardens (defined as polygons in GeoJSON) using aerial imagery stored in OSGB36 format tiles,\nalthough basic support is also present for Web Mercator.\nThe project background and methodology are explained in the Data Science Campus [report](https://datasciencecampus.ons.gov.uk/projects/green-spaces-in-residential-gardens/).\n\nGiven its primary use case, the indices calculated are referred to as vegetation indices, but please note the indices are simply functions that accept an image (stored as colour tuple per pixel, forming a 3D numpy array) and return a 2D boolean array indicating a pixel's label. The analysis code then accumulates the percentage of `true` and `false` results per polygon to produce a percentage coverage per polygon. The indices are hence free to represent anything - if polygons represent buildings, an index could mark if pixels are roof tiles; if polygons are fields, an index could mark if pixels are bare earth; the only constraints are what can be detected at the pixel level given your available imagery.\n\n# Installation\nThe tool has been developed to work on Windows, Linux and MacOS. To install:\n\n1. Please make sure Python 3.6 is installed and set at your path; it can be installed from the [Python release](https://www.python.org/downloads/release/python-360/) pages, selecting the *relevant installer for your operating system*. When prompted, please check the box to set the paths and environment variables for you and you should be ready to go. Python can also be installed as part of [Anaconda](https://www.anaconda.com/download/).\n\n   To check the Python version default for your system, run the following in command line/terminal:\n\n   ```\n   python --version\n   ```\n   \n   **_Note_**: If Python 2 is the default Python version, but if you have installed Python 3.6, your path may be setup to use `python3` instead of `python`.\n   \n2. To install the packages and dependencies for the tool, from the root directory (Green_Spaces) run:\n   ``` \n   pip install -e .\n   ```\n   This will install all the libraries for you.\n\n3. To execute the unit tests run:\n   ```\n   python setup.py test\n   ```\n   This will download any required test packages and then run the tests.\n\n# User Instructions\nThe tools available are:\n* Polygon analysis\n* Imagery coverage\n* Simple work distribution\n\nThese are now described after the initial dataset configuration, upon which all tools depend to find aerial imagery.\n\n## Dataset Configuration\nYour locally available imagery must be configured in a file called `green_spaces/analyse_polygons.json`; a template\nis provided in `green_spaces/analyse_polygons_template.json` which can be copied and updated to match your locally\navailable data. The JSON file then defines available image loaders (and hence data sources) and available metrics (various vegetation indices are provided).\n\nEach image loader defines the spectral channels for a given image (for instance R,G,B or Ir,R,G), the location of the data, the dataset name and the python class responsible for loading the data. This enables new image loaders to be added without changing existing code, with specific image loaders having additional parameters as required. For instance, Ordnance Survey (OS) national grid datasets have a specific number of pixels per 1 kilometre (km) square (determined by image resolution, for example 12.5 centimetre (cm) imagery is 8,000 pixels wide). This enables a resolution independent Ir,R,G,B data reader to be created that internally combines the CIR and RGB datasets to generate the required imagery on demand.\n\nOSGB36 imagery is assumed to be stored in a hierarchy of folders, of the form `TT/TTxy` which would contain files named `TTxayb.jpg` with metadata in `TTxayb.xml`. For example, the tile `HP4705` is stored in folder `HP\\HP40`.\n\nWeb mercator imagery is stored in a user-defined hierarchy; the example is in the form `http://your-image-source.com/folderf/folder/{zoom}/{x}/{y}.png`, where the zoom level and x, y coordinates will be replaced at runtime. *Note* that web mercator support is experimental and incomplete.\n \nThe data sources are intentionally independent of the vegetation indices. Additionally, the same data reader can be used with different physical datasets. For example, 25 cm OSGB data can be read using the same reader as 12.5 cm OSGB data, with a minor configuration change needed specifying the location of data and number of pixels per image. As the data readers are python classes with the same methods, the code that uses a reader does not need to know if it is consuming OSGB data or Web Mercator, it simply uses the returned results which are in a common form and hence source agnostic.\n\nThe vegetation indices are defined in the JSON file to enable the end user to add new metrics and change their thresholds without altering Python source code. Metrics may be from a different codebase entirely rather than restricted to be part of the project source code. Vegetation indices and image loaders are defined in terms of class name and created using Python’s importlib functionality to create class instances directly from names stored as text strings at run time (note that all indices supplied are defined in `green_spaces\\vegetation_analysis.py`).\n\n## Polygon Analysis\nThe polygon analysis tool takes a GeoJSON file defining polygons as input, projects these polygons onto the selected image source and applies the requested vegetation index to the pixels within the polygon, as per the following process flow:\n\n\u003cp align=\"center\"\u003e\u003cimg align=\"center\" src=\"https://datasciencecampus.ons.gov.uk/wp-content/uploads/sites/10/2019/07/Figure_27-2.png\" width=\"400px\"\u003e\u003c/p\u003e\n\nThe polygon analysis tool is now described in the following sections, starting with the available online help, followed by an example use case, explanation of remaining command line parameters and finally a list of available vegetation indices.\n\n### Initial Help\nA set of polygons supplied in GeoJSON format can be analysed with `green_spaces\\analyse_polygons.py`; to reveal the available command line options enter:\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python green_spaces/analyse_polygons.py -h\nusage: analyse_polygons.py [-h] [-o OUTPUT_FOLDER] [-pc PRIMARY_CACHE_SIZE]\n                           [-esc] [-v] [-fng FIRST_N_GARDENS]\n                           [-rng RANDOM_N_GARDENS] [-opv]\n                           [-wl {12.5cm RGB aerial,25cm RGB aerial,50cm CIR aerial,50cm CIR aerial as RGB,12.5cm RGB with 50cm IR aerial,25cm RGB with 50cm IR aerial,Lle2013}]\n                           [-i {naive,greenleaf,hsv,ndvi-cir,ndvi-irgb,vndvi,vari,lab1,lab2,matt,matt2,nn} [{naive,greenleaf,hsv,ndvi-cir,ndvi-irgb,vndvi,vari,lab1,lab2,matt,matt2,nn} ...]]\n                           [-di {0,1,2,4}]\n                           \u003cgeojson input file name\u003e\n\nParse GeoJSON files, download imagery covered by GeoJSON and calculate\nrequested image metrics within each GeoJSON polygon\n\npositional arguments:\n  \u003cgeojson input file name\u003e\n                        File name of a GeoJSON file to analyse vegetation\n                        coverage\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER\n                        Folder name where results of vegetation coverage are\n                        output\n  -pc PRIMARY_CACHE_SIZE, --primary-cache-size PRIMARY_CACHE_SIZE\n                        Memory to allocate for map tiles primary cache (0=no\n                        primary cache); uses human friendly format e.g.\n                        12M=12,000,000\n  -esc, --enable-secondary-cache\n                        Use local storage to hold copies of all downloaded\n                        data and avoid multiple downloads\n  -v, --verbose         Report detailed progress and parameters\n  -fng FIRST_N_GARDENS, --first-n-gardens FIRST_N_GARDENS\n                        Only process first N gardens\n  -rng RANDOM_N_GARDENS, --random-n-gardens RANDOM_N_GARDENS\n                        Process random N gardens\n  -opv, --only-paint-vegetation\n                        Only paint vegetation pixels in output bitmaps\n  -wl {12.5cm RGB aerial,25cm RGB aerial,50cm CIR aerial,50cm CIR aerial as RGB,12.5cm RGB with 50cm IR aerial,25cm RGB with 50cm IR aerial,Lle2013}, --loader {12.5cm RGB aerial,25cm RGB aerial,50cm CIR aerial,50cm CIR aerial as RGB,12.5cm RGB with 50cm IR aerial,25cm RGB with 50cm IR aerial,Lle2013}\n                        What tile loader to use (default: None)\n  -i {naive,greenleaf,hsv,ndvi-cir,ndvi-irgb,vndvi,vari,lab1,lab2,matt,matt2,nn} [{naive,greenleaf,hsv,ndvi-cir,ndvi-irgb,vndvi,vari,lab1,lab2,matt,matt2,nn} ...], --index {naive,greenleaf,hsv,ndvi-cir,ndvi-irgb,vndvi,vari,lab1,lab2,matt,matt2,nn} [{naive,greenleaf,hsv,ndvi-cir,ndvi-irgb,vndvi,vari,lab1,lab2,matt,matt2,nn} ...]\n                        What vegetation index to compute (default: None);\n                        options are: 'naive' (Assumes all pixels within\n                        polygon are green), 'greenleaf' (Green leaf index),\n                        'hsv' (Green from HSV threshold), 'ndvi-cir'\n                        (Normalised difference vegetation index from CIR),\n                        'ndvi-irgb' (Normalised difference vegetation index\n                        from IRGB), 'vndvi' (Visual Normalised difference\n                        vegetation index), 'vari' (Visual atmospheric\n                        resistance index), 'lab1' (Green from L*a*b* colour\n                        space, 'a' threshold only), 'lab2' (Green from L*a*b*\n                        colour space, 'a' and 'b' thresholds), 'matt'\n                        (Interpret Ir, G, B as R, G, B and filter by HSV),\n                        'matt2' (Interpret Ir, G, B as R, G, B and filter by\n                        HSV), 'nn' (Neural network vegetation classifier)\n  -di {0,1,2,4}, --downsampled-images {0,1,2,4}\n                        Dump downsampled images for each garden for\n                        debugging/verification ('0' does not produce images,\n                        '1' produces unscaled images, '2' produces 1:2\n                        downsampled images, '4' produces 1:4 downsampled\n                        images\nGreen_Spaces$ \n```\n\n### Example Usage\nTo analyse foliage using the green leaf index, you can enter:\n\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python green_spaces/analyse_polygons.py -pc 4G -i greenleaf -wl \"25cm RGB aerial\" data\\example_gardens.geojson\nUsing TensorFlow backend.\nSorting features: 100%|#######################################################| 928/928 [00:00\u003c00:00, 1107.38feature/s]\nAnalysing features (0 cached, 16 missed; hit rate 0.0%):   2%|3                  | 15/928 [00:09\u003c10:39,  1.43feature/s]\n```\nThis requests 4Gb of memory to be allocated for image caching, selects `greenleaf` as the index to process, and `25cm RGB aerial` as the imagery source. The GeoJSON to analyse is located at `data\\example_gardens.geojson`. \n\nThe polygons are projected into the selected image dataset (in this case: `25cm RGB aerial`), the polygons are sorted spatially to improve caching, and then the polygons are analysed in turn.\n\nNote that image tiles are slow to load as they are pulled from a potentially slow storage medium , and then are decompressed into memory; hence we cache loaded images to improve throughput. Sorting image in turn improves cache use - for example, 2 polygons per second are processed without image caching, around 15 polygons per second with image caching.\n\nOnce the GeoJSON is processed, the output will look like:\n```bash\nAnalysing features (992 cached, 6 missed; hit rate 99.4%): 100%|################| 928/928 [01:02\u003c00:00, 14.75feature/s]\nNumber of map tile requests: 998\nNumber of map tile cache hits vs misses: 992 vs 6\nGreen_Spaces$\n```\n\nThis reveals how effective the cache was - in this example, 992 polygons generated 998 image tile requests (as some polygons will straddle the boundary between tiles and hence need more than one tile), but of these requests 992 were served from cache with only 6 requests actually pulling data from storage.\n\nA folder has been created with the results of the analysis; this is relative to the current folder and named `output/25cm RGB aerial` (to match the name of the image source used). Three files are output, named after the input GeoJSON file, the image source and index requested:\n* example_gardens-25cm RGB aerial-greenleaf-summary.txt\n  * Provides a summary of the analysis, namely total polygon surface area, total surface area regarded as vegetation by the metric, and the co-ordinate reference system used to record polygon location.\n* example_gardens-25cm RGB aerial-greenleaf-toid2uprn.csv\n  * A two column dataset that maps feature id to feature uprn (as extracted from the GeoJSON)\n* example_gardens-25cm RGB aerial-greenleaf-vegetation.csv\n  * Detail of the analysis, one row per polygon, recording feature id, polygon centroid in the given reference co-ordinate system, surface area and fraction classified as vegetation\n  \nNote that the metrics do not necessarily have to indicate vegetation - it could be (for instance) tarmac you are searching for (although note that the code at present reports \"vegetation\" which could be replaced with \"coverage\" or a similar more generic term in future).\n\nAdditionally, metrics are correct for OSGB36 tiles (such as surface area), however the results are not supported with web mercator format due to non-linear mapping between pixels and surface area.\n\n### Optional Arguments\n\nNote that multiple indices can be processed at once, to make maximum use of the imagery whilst it is in memory; simply supply a series of index names after the index option, so to process green leaf and visual atmospheric resistence index, enter:\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python green_spaces/analyse_polygons.py -pc 4G -i greenleaf vari -wl \"25cm RGB aerial\" data\\example_gardens.geojson\nUsing TensorFlow backend.\nSorting features: 100%|#######################################################| 928/928 [00:00\u003c00:00, 1339.10feature/s]\nAnalysing features (992 cached, 6 missed; hit rate 99.4%): 100%|################| 928/928 [00:51\u003c00:00, 17.94feature/s]\nNumber of map tile requests: 998\nNumber of map tile cache hits vs misses: 992 vs 6\n```\n\nThis outputs files with both indices in the file names, such as `example_gardens-25cm RGB aerial-greenleaf-vari-summary.txt`; the summary will contain extra rows for each additional index requested, and the vegetation file will contain an extra column for each extra index.\n \nThe output can be directed to a selected folder (default is `output`) with the `-o \u003cfolder name\u003e` option.\n\nDebug support is provided where each analysed polygon can be written out as a PNG format bitmap; select `-di 1`. Bitmaps can be output at smaller scales if required, for instance `-di 2` produces 1:2 downsampled images.\n In addition, the bitmaps can be only overlaid with the calculated vegetation (so revealing which pixels are regarded as vegetation), for this use `-opv`.\n\nIf a subset of the images is required, you can select the first N gardens via `-fng \u003cN\u003e` where _N_ is the number of gardens, or a random selection (repeatable for a given file as a seeded psuedo random number is used) with `-rng \u003cN\u003e`.\n\nIf the data is downloaded from a slow network, a secondary level cache can be enabled with `-esc` which will tale a copy of downloaded data and store it in the local `cache` folder; this is experimental and only supported at present for WebMercator. Note that there is no upper storage limit for the secondary cache.  \n\n### Vegetation Indices\nEach vegetation index is now described along with its configuration. Note that all indices have configuration stored in `analyse_polygons.json` as part of each indices' definition. The configuration is index dependent (the JSON data is passed directly to the index implementation for it to determine its configuration). Further information may be found in the \"[Vegetation detection](https://datasciencecampus.ons.gov.uk/projects/green-spaces-in-residential-gardens/#section-3)\" section of the project report.\n\n#### naive\nNo configuration, simply returns \"true\" for all pixels - in effect assumes all pixels within a polygon represent vegetation.\n\n#### greenleaf\nImplements the [Green Leaf Index](https://www.harrisgeospatial.com/docs/BroadbandGreenness.html#Green6), where low and high thresholds define what is flagged as \"vegetation\".\n\n#### hsv\nMaps pixel colour into HSV colour space and flags vegetation if the hue is within a specified threshold range.\n\n#### ndvi-cir\n[Normalised Difference Vegetation Index](https://www.harrisgeospatial.com/docs/BroadbandGreenness.html#NDVI), adapted to use the Colour Infra Red image format (stored is Ir,R,G in the R,G,B channels). Returns true if ndvi falls within a threshold range.\n\n#### ndvi-irgb\n[Normalised Difference Vegetation Index](https://www.harrisgeospatial.com/docs/BroadbandGreenness.html#NDVI), adapted to use the 32bit imagery (R,G,B,Ir stored in the R,G,B,A fields). Returns true if ndvi falls within a threshold range.\n\n#### vndvi\n[Visual Normalised Difference Vegetation Index](https://support.precisionmapper.com/support/solutions/articles/6000214541-visual-ndvi), returns true if vndvi falls within the threshold range.\n\n#### vari\n[Visual Atmospheric Resistance Index](https://support.precisionmapper.com/support/solutions/articles/6000214543-vari), returns true if vndvi falls within the threshold range.\n\n#### lab1\nGreen from [L*a*b colour space](https://en.wikipedia.org/wiki/CIELAB_color_space), returns true if the `a` component of a pixel falls within a threshold range.\n\n#### lab2\nGreen from [L*a*b colour space](https://en.wikipedia.org/wiki/CIELAB_color_space), returns true if both the `a` and `b` components of a pixel falls within threshold ranges (different thresholds for `a` and `b`).\n\n#### nn\nArtificial neural network trained on gardens in Bristol and Cardiff, returns `true` if a pixel is deemed vegetation. Configuration stores PCA mapping for the three PCA variants (monochrome, brightness and colour), and also the weights and architecture of the neural network (stored using [Keras](https://keras.io/) in an HDF5 file). Further information is presented in the \"[Vegetation detection using supervised machine learning](https://datasciencecampus.ons.gov.uk/projects/green-spaces-in-residential-gardens/#section-8)\" section of the project report.    \n\n## OSGB36 Summary Images\n\nTools for generating summary images from OSGB36 tiled imagery are now presented.\n\n### Initial Help\n\nGiven that you have created an `analyse_polygons.json` configuration file, you can now launch the coverage tool:\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python green_spaces/generate_coverage.py -h\nusage: generate_coverage.py [-h] [-ts TILE_SIZE] [-tqdm USE_TQDM]\n                            [-ca {thumbnail,coverage,flights}]\n                            [-rf ROOT_FOLDER]\n                            {12.5cm RGB aerial,25cm RGB aerial,50cm CIR\n                            aerial,50cm CIR aerial as RGB}\n\nGenerate overall map from OSGB folder hierarchy\n\npositional arguments:\n  {12.5cm RGB aerial,25cm RGB aerial,50cm CIR aerial,50cm CIR aerial as RGB}\n                        Which dataset to analyse\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -ts TILE_SIZE, --tile-size TILE_SIZE\n                        Tile size each image is mapped to\n  -tqdm USE_TQDM, --use-tqdm USE_TQDM\n                        Use TQDM to display completion graphs\n  -ca {thumbnail,coverage,flights}, --coverage-analysis {thumbnail,coverage,flights}\n                        Data represented in summary image\n  -rf ROOT_FOLDER, --root-folder ROOT_FOLDER\n                        Root folder where aerial photography is stored\n                        \nGreen_Spaces$\n```\n\n### Example Usage\n\nTo generate a single image from all imagery present in a dataset, use:\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python green_spaces/generate_coverage.py -ts 8 -tqdm true -ca thumbnail -rf thumbnails \"50cm CIR aerial\"\nSummary data shape: 10,400 x 5,600 pixels\n\n100km tiles:   0%|                                                                              | 0/55 [00:00\u003c?, ?it/s]\n10km tiles in HP:  50%|#################################                                 | 3/6 [00:17\u003c00:13,  4.66s/it]\n1km tiles in HP60:  70%|###########################################9                   | 37/53 [00:05\u003c00:02,  7.42it/s]\n```\n\nThis has requested tiles of 8 pixels by 8 pixels to represent the source image tiles, where image tiles are read from the `50cm CIR aerial` dataset. A progress bar has been requested (we use the TQDM library), and the output is to be stored in the `thumbnails` folder. This will generate a single bitmap, in this case of size 10,400 by 5,600 pixels, along with a report of any issues when reading images.\n\nNote that this will probably take a long time - considering 100's of Gb of data may be processed.\n\n### Supported Options\n\nThree formats are supported:\n* `thumbnail`\n  * Each image bitmap is downsampled and stiched together for an overview map\n* `coverage`\n  * Each image is represnted by a white tile if present, black otherwise; this enables a rapid determination if any files are missing\n* `flights`\n  * The metadata for each image is processed, created a coloured tile for each image where the colour represents the image capture date. The tiles are stitched together to form an overview map, complete with colour key. One image is generated for time of year (to enable seasonality analysis), and another image is generated for the complete date (enabling age of imagery analysis)\n\n## Simple Work Distribution\nGiven that a large number of polygons may need to be processed, we provide tools to split a large GeoJSON file into many smaller files, and then to distribute the work across a cluster of machines. All utilities support the `-h` command line option for help with command line arguments.\n\n### Split Large GeoJSON\nIf a GeoJSON is large (e.g. more than 100,000 polygons) it may be beneficial to split the file to enable distributed analysis. To split such a file, enter:\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python scripts/split_geojson.py -fpf 10000 your_polygons.geojson\nExtracting features into sets of 1000: 100%|██████████████████████████████| 10000/10000 [00:04\u003c00:00, 2430.21feature/s]\n\nGreen_Spaces$\n```\n\nThis will generate _N_ files (depending on how many sets of 1,000 polygons are required to store your original dataset). The new files will be created in the same folder as the source file, with the suffix `XofY`, so if 12 files were needed with the above example, the new files will be named `your_polygons_1of12.geojson`, `your_polygons_2of12.geojson`, etc.\n\nThe number of polygons per file is specified with the `-fpf` parameter.\n\n### Bulk Analysis of GeoJSON\n\nTo perform bulk analysis, the following folders are required:\n* Processing\n  * GeoJSON files that are currently being processed\n* Inpile\n  * GeoJSON files that are yet to be processed\n* Outpile\n  * GeoJSON files that have been processed\n* Results\n  * Output from `analyse_polygons` produced for each GeoJSON in the outpile folder\n  \nTo run a bulk analysis using the `analyse_polygons.py` utility, instead use:\n\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python scripts/bulk_analyse.py -if inpile_folder -of outpile_folder -rf results_folder -pf processing_folder -pcs 4G -i greenleaf -wl \"25cm RGB aerial\" \n```\n\nThis will look in the specified inpile folder (`inpile_folder` in example) for any unprocessed GeoJSON. If none are present, it will terminate as all work is complete. Otherwise, it will attempt to move a GeoJSON into the processing folder (named `processing_folder` in the example), into a folder named after the current machine and its process ID. As part of the POSIX standard, such an operation is atomic and hence only one machine can succeed (if two machines attempt to move the same file, one will fail and retry a different GeoJSON). The dataset and cache parameters are given to `analyse_polygons.py` along with the GeoJSON filename, with output directed to the results folder.\n\n### Recombining Results\n\nOnce all GeoJSON are processed, the results need to be recombined so the end user can continue as if a single GeoJSON was processed (rather than being concerned with potentially 100's of partial files). To recombine the outputs from the bulk analysis, enter:\n\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python scripts/bulk_recombine.py -rf results_folder -of combined_results_folder -i greenleaf -wl \"25cm RGB aerial\" \n```\n\nThis searches for results in the `results_folder`, which are from the specified index and data source. The combined results are written to the output folder (specified as `combined_results_folder` in the example).\n\nThe end results will be the same three files as if the original GeoJSON was analysed directly as a single file.\n\n### Sift Incomplete Results\nOne problem of naively distributing the analyses amongst independent machines, is the potential for machines to fail. In which case, GeoJSON files may be moved to the output folder without producing corresponding results files. This utility detects such GeoJSON files, indicating they haven't been processed, and moves the files back to the inpile folder. To run the utility, enter:\n\n```bash\nGreen_Spaces$ export PYTHONPATH=.\nGreen_Spaces$ python scripts/bulk_sift_incomplete.py -if inpile_folder -of outpile_folder -rf results_folder\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatasciencecampus%2Fgreen-spaces","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatasciencecampus%2Fgreen-spaces","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatasciencecampus%2Fgreen-spaces/lists"}