{"id":20486298,"url":"https://github.com/undp-data/sids-data-pipeline","last_synced_at":"2025-08-17T13:40:01.149Z","repository":{"id":41999367,"uuid":"444860843","full_name":"UNDP-Data/sids-data-pipeline","owner":"UNDP-Data","description":"Python data pipeline for SIDS project","archived":false,"fork":false,"pushed_at":"2022-10-03T03:56:07.000Z","size":145,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-05T16:40:48.651Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UNDP-Data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-05T15:49:43.000Z","updated_at":"2022-06-08T08:10:52.000Z","dependencies_parsed_at":"2023-01-19T00:15:28.450Z","dependency_job_id":null,"html_url":"https://github.com/UNDP-Data/sids-data-pipeline","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/UNDP-Data/sids-data-pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fsids-data-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fsids-data-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fsids-data-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fsids-data-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UNDP-Data","download_url":"https://codeload.github.com/UNDP-Data/sids-data-pipeline/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fsids-data-pipeline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270856564,"owners_count":24657688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T16:35:59.380Z","updated_at":"2025-08-17T13:40:01.127Z","avatar_url":"https://github.com/UNDP-Data.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SIDS data processing pipeline\n\n## Intro\n\n**Small Islands Developing States (SIDS)** is a group of island states spatially disjoint located all over the world. This data pipeline can be used to pre-process and generate the bulk of spatial data for the SIDS platform [geospatial application](https://data.undp.org/sids/geospatial-data). The pipeline computes zonal stats for a number of vector layers from a number of raster layers and converts the results into MapBox vector tiles (.pbf) and stores them in an Azure Blob storage container.\n\n## Project Structure\n\nInputs are hosted on an Azure Container Blob, in the `inputs` folder of the `sids` container. Rasters and vectors are stored in the respective subfolders, as GeoPackages and GeoTiffs. The `batch.csv` file provides metadata about rasters. \n\n```shell\ninputs\n├── batch.csv\n├── rasters\n│   ├── data1.tif\n│   ├── data2.tif\n│   └── data3.tif\n└── vectors\n    ├── zone1.gpkg\n    ├── zone2.gpkg\n    └── zone3.gpkg\n```\n\n## Batch\n\nBatch is the first sub-module, helping to import rasters from all throughout Azure blob storage into a single folder. This module takes a few hours to runn for . Reading the `batch.csv`, the following data standardizations take place:\n\n- ZSTD compression\n- ESPG:4326 projection\n- clipped to lonmin=-180, lonmax=180, latmin=-35, latmax=35\n\n## Pipeline\n\nPipeline is the second sub-module, taking the majority of time to run to generate zonal statistics and vector tiles. The pipeline is optimized to check if a vector/raster combination already exists at the destination, in which case it will be skipped.\n\n## Setup\n\nTo get started, populate the .env file with values using the template, and log into Azure and Docker.\n\n```shell\naz login\ndocker login undpgeohub.azurecr.io\n```\n\nTo run either the batch or pipeline, change directory into one of the following and run `./deploy.sh` from that subfolder.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fundp-data%2Fsids-data-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fundp-data%2Fsids-data-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fundp-data%2Fsids-data-pipeline/lists"}