{"id":41342969,"url":"https://github.com/rodekruis/exposure-vulnerability-retrieval","last_synced_at":"2026-01-23T06:53:06.503Z","repository":{"id":287369293,"uuid":"943976541","full_name":"rodekruis/exposure-vulnerability-retrieval","owner":"rodekruis","description":"Disaster exposure and vulnerability data retrieval","archived":false,"fork":false,"pushed_at":"2025-12-30T15:23:14.000Z","size":994,"stargazers_count":0,"open_issues_count":6,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-03T09:28:28.241Z","etag":null,"topics":["aa"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rodekruis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-06T15:23:16.000Z","updated_at":"2025-12-18T16:09:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"df5db5d4-bc9d-4d62-b614-dd716969e045","html_url":"https://github.com/rodekruis/exposure-vulnerability-retrieval","commit_stats":null,"previous_names":["rodekruis/ibf-exposure-vulnerability-retrieval","rodekruis/exposure-vulnerability-retrieval"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rodekruis/exposure-vulnerability-retrieval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rodekruis%2Fexposure-vulnerability-retrieval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rodekruis%2Fexposure-vulnerability-retrieval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rodekruis%2Fexposure-vulnerability-retrieval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rodekruis%2Fexposure-vulnerability-retrieval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rodekruis","download_url":"https://codeload.github.com/rodekruis/exposure-vulnerability-retrieval/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rodekruis%2Fexposure-vulnerability-retrieval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28682263,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T05:48:07.525Z","status":"ssl_error","status_checked_at":"2026-01-23T05:48:07.129Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aa"],"created_at":"2026-01-23T06:53:06.430Z","updated_at":"2026-01-23T06:53:06.493Z","avatar_url":"https://github.com/rodekruis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Retrieval\n\n\n## ETL Pipeline\n\n### Example use\nTo run the pipeline for Somalia (with iso3 code 'SOM'):\n```sh\npoetry install\npoetry run python run_pipeline --country-iso3 SOM\n```\nOptions:\n- `-h, --help` : show this help message and exit\n- `--country-iso3 STR`: Country ISO3 code to run the pipeline for. (required)\n- `--run-id STR`: Unique identifier for the pipeline run. (default: now)\n- `--extract, --no-extract`: Boolean to indicate if extract should be performed. (default: True)\n- `--transform, --no-transform`: Boolean to indicate if transform should be performed. (default: True)\n- `--load, --no-load`: Boolean to indicate if load should be performed. (default: True)\n- `--debug, --no-debug`: Boolean to indicate if logger level should be set to DEBUG instead of INFO. (default: False)\n\n\nBy default it runs the both the extracts, transforms and loads.\nTo turn of the extracts add the flag `--no-extract`, similarly add `--no-transform` or `--no-load` to turn of the transform or load steps.\n\n### Pipeline Configurations/Specification\nOne pipeline run is a ETL run for one country.\nFor the construction of a pipeline, configuration from `CONFIGS` from `retrievalpipelines/config` is used.\n\nEvery country, for which the pipeline can possible run, has a configuration file in `retrievalpipelines/config` (e.g., `somalia.py`).\nThese files contain a class that implements the `CountryConfig` protocol defined in `retrievalpipelines/config/base.py`.\n\nTo add a configuration for a Country add a file for this country with a `CountryConfig` class and add the class to `_CONFIGS` in `retrievalpipelines/config/config.py`.\n\n\n### Pipeline structure\nThe pipeline consists of Extractors, Transforms and Loaders.\n\n`Extractors`/`Transforms`/`Loaders` are classes that adhere to the Extractor/Transformer/Loader Protocols.\n\nEach extract method of an extractor contains the logic to decide what to download and from where.\nIt then uses a storage to download and store the file(s) somewhere on the bronze data layer.\n\nThe transforms takes data from the bronze data layer and transforms it to and saves intermediate results to the silver layer.\nLastly, it transforms the silver data to the gold data layer.\n\nThe loader uploads the data from gold to where it can be used by other systems.\n\n\n### Storage\nAn extraction needs a storage that can download a file from a url and a path where to save the file.\nFor example, the storage can be one that saves the file locally (`LocalStorage`) or on a azure blob storage (`AzureBlobStorage`).\n\n\n\n### Data extraction specifics\n\n#### ECMWF\nFor the ECMWF extractors it is needed to have an account.\nSet the api_key in environment variable `ECMWF_DATASTORES_KEY`.\nWhen this is put in `.env` file it is automatically picked up during tests.\n\nFurthermore, for the ECMWF extractors one needs to accept the *Terms of use*.\nThis is done on the ECMWF website and requires you to be logged in (corresponding to the account of the api-key).\nFor example, for the extreme heat dataset see: https://cds.climate.copernicus.eu/datasets/derived-utci-historical?tab=download.\n\n#### CHIRPS\nTBA\n\n#### WorldPop\nTBA\n\n#### IOM DTM\nIOM DTM data requires API key to access data. See more about the [DTM API](https://dtm.iom.int/data-and-analysis/dtm-api) latest version V3.\nSet the api_key in environment variable `DTM_API_KEY` in the `.env` file.\n\n#### IPC\nFor IPC extractor, API key is needed.\nSet api key: TBA\n\n#### FEWS NET\nTBA\n\n### Data transformation\n\nAfter extractions, the transformations turns those datasets into district‑level suitable for multi‑criteria prioritisation.  It operates on\nthree broad climate hazards—drought, flood and extreme heat—and produces baseline, recent and\ntrend indicators for each, as well as other transformations.\n\n### Indicators\n\nThe table below summarises what each climate indicator measures and the datasets it uses.\n\n| Hazard \u0026 period    | Description (what it measures)                                                                 | Main input datasets \u0026 processing summary |\n|--------------------|------------------------------------------------------------------------------------------------|-----------------------------------------|\n| **Drought baseline** | Quantifies long‑term drought vulnerability (1991–2020) by combining three components: variability of annual precipitation (coefficient of variation), frequency of years below average precipitation and the deficit relative to the mean. These are normalised and combined using a quadratic mean to produce a district‑level index. | CHIRPS precipitation NetCDF for 1991–2020; data are reprojected to EPSG:3857, clipped to ADM2 boundaries and processed to compute CV, frequency and intensity before normalisation and combination. |\n| **Flood baseline** | Measures long‑term flood exposure using the JRC Monthly Water History (1984–2021).  The pipeline filters out permanent water, aggregates monthly flooded area to annual percentages and normalises the resulting values. | JRC Monthly Water History v1.4 rasters; data are downloaded for the baseline period, reprojected, clipped to ADM2, non‑permanent water masked, aggregated and min–max normalised. |\n| **Extreme heat baseline** | Captures the frequency, intensity and persistency of extreme heat during 1991–2020.  Extreme heat is defined using a UTCI threshold (default 32 °C).  The daily maximum UTCI and exceedances are computed, then normalised and combined. | Hourly UTCI derived from ERA5 reanalysis (UTCI NetCDF); data are projected, clipped to ADM2, daily maxima calculated, exceedances counted and aggregated before normalisation and combination. |\n| **Drought recent** | Assesses recent drought anomalies (typically last 24 months) by computing the Standardised Precipitation Index (SPI) on CHIRPS monthly rainfall and deriving frequency, severity and persistency of severe drought (SPI \u003c –1.5). | CHIRPS monthly NetCDF; SPI is computed for the last two years, drought frequency/severity/persistency are derived and normalised before combining. |\n| **Flood recent** | Quantifies flood exposure in the last two years using GFM products.  Flooded area is aggregated monthly and normalised. | Copernicus GFM Product GeoTIFFs; data are reprojected, clipped to ADM2, aggregated to annual flood area percentages and normalised. |\n| **Extreme heat recent** | Measures recent (last two years) extreme heat frequency, intensity and persistency.  Uses the same approach as the baseline but restricted to recent data. | Recent UTCI (ERA5) hourly NetCDF; the number and degree of threshold exceedances and maximum consecutive exceedance duration are computed, normalised and combined. |\n| **Drought trend** | Evaluates 30‑year trends in drought risk by fitting linear regressions to time series of drought frequency, intensity and persistency.  Negative slopes are clipped to zero before normalisation. | CHIRPS monthly NetCDF; SPI‑based metrics are computed for each year, trends estimated by linear regression and normalised before combination. |\n| **Flood trend** | Tracks trends in flood extent over 1984–2021 by fitting a linear regression to annual flood percentages.  Negative trends are set to zero and positive slopes normalised. | JRC Monthly Water History v1.4 rasters; annual flooded area percentages are computed, trends estimated via linear regression and normalised. |\n| **Extreme heat trend** | Assesses trends in extreme heat frequency, intensity and persistency over the last 30 years using UTCI data.  Negative trends are clipped and remaining slopes normalised and combined. | ERA5‑derived UTCI hourly NetCDF; annual time series of exceedance frequency/intensity/persistency are built, linear trends computed and normalised. |\n| **Displacement** | TBA | TBA |\n| **Food insecurity** | TBA | TBA |\n| **Population** | TBA | TBA |\n\n## To contribute\n\n### Dependencies\nIf needed install poetry (or uv). Then install dependencies\n`poetry install`\nor if your `poetry.lock` is behind of `pyproject.toml` first resolve the dependencies with `poetry lock`.\n\n### Pre-commit\nActivate pre-commit with `poetry run pre-commit install`.\n\nWhen you make a commit it will first run some checks.\nThese can be found in `.pre-commit-config.yaml`.\nIt helps you from committing mistakes or messy code.\nIt checks things like: valid jsons, if your type hints are correct, if the code is formatted and if the quality is ok.\nWhen the checks fail the commit is stopped and you can fix the issues.\nIf it can trivially fix things it will do this for you.\nThen you can inspect the changes it made, add the file again and run the commit again.\nThe rules the linter checks are specified in `pyproject.toml`.\nExplanations and reasons for these rules can be found in the [Ruff rules doc](https://docs.astral.sh/ruff/rules/).\n\nMore info:\n- [Pre-commit](https://pre-commit.com/)\n- [Typechecker mypy](https://mypy.readthedocs.io/en/stable/)\n- [Ruff linter](https://docs.astral.sh/ruff/linter/)\n- [Ruff formatter](https://docs.astral.sh/ruff/formatter/)\n\n### Test\nRun the tests in the `tests` folder with:\n```bash\npoetry run python -m pytest tests\n```\nThe tests that needs secret load the environment variables in `.env.`\n\nRun tests that do not needs secrets with:\n```bash\npoetry run python -m pytest tests -m \"not needs_secrets\"\n```\nTo show the logs during testing run:\n```bash\npoetry run python -m pytest tests -log_cli=1 -s\n```\n\n\n### CI\nWhen you make a PR to main or dev the CI-pipeline will run (`.github/workflows/ci.yaml`).\n\nThis checks if the code is formatted and if the linter agrees with the code quality.\nFurthermore, it runs the tests in the folder `tests` that do not need secrets (i.e., do not have the mark 'needs_secrets') and checks if the coverage is above a certain threshold (to be set in `ci.yaml`).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frodekruis%2Fexposure-vulnerability-retrieval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frodekruis%2Fexposure-vulnerability-retrieval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frodekruis%2Fexposure-vulnerability-retrieval/lists"}