{"id":51073569,"url":"https://github.com/thchilly/sm_attribution","last_synced_at":"2026-06-23T12:32:55.237Z","repository":{"id":320338602,"uuid":"1080431719","full_name":"thchilly/sm_attribution","owner":"thchilly","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-10T01:17:27.000Z","size":7247,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-10T09:30:41.620Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thchilly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-21T11:10:26.000Z","updated_at":"2026-03-10T01:17:31.000Z","dependencies_parsed_at":"2025-10-23T09:09:14.075Z","dependency_job_id":"925cf8ad-8dd8-480f-a77f-4dafba439205","html_url":"https://github.com/thchilly/sm_attribution","commit_stats":null,"previous_names":["thchilly/sm_attribution"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thchilly/sm_attribution","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thchilly%2Fsm_attribution","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thchilly%2Fsm_attribution/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thchilly%2Fsm_attribution/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thchilly%2Fsm_attribution/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thchilly","download_url":"https://codeload.github.com/thchilly/sm_attribution/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thchilly%2Fsm_attribution/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34688114,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-23T12:32:53.844Z","updated_at":"2026-06-23T12:32:55.211Z","avatar_url":"https://github.com/thchilly.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sm_attribution\n\nReproducing and strengthening the analysis on human-induced changes in\nglobal soil-moisture droughts.  Python-first reimplementation of the prior\nMATLAB workflow.\n\n## Quick start\n\n```bash\n# 1. Create the conda environment\nconda env create -f environment.yml\nconda activate sm-attr-311\n\n# 2. Install the package in editable mode\npip install -e .\n\n# 3. Configure local paths\n#    Edit configs/data_registry.yml — set paths.root to your data directory.\n\n# 4. Run the full pipeline\nchmod +x run_all.sh\n./run_all.sh                     # all 4 steps\n./run_all.sh --start-from 3     # resume from step 3\n./run_all.sh --dry-run           # print commands without executing\n```\n\n## Data preprocessing\n\nBefore the analysis pipeline can run, raw soil-moisture files must be\nhomogenized to a common reference format.  This is handled by the\n`preprocess` subpackage and two preparation scripts.\n\n**What the preprocessing does:**\n\n- **Temporal harmonization** — monthly means on a `proleptic_gregorian`\n  calendar (first-of-month timestamps).\n- **Spatial harmonization** — regridding to a uniform 0.5° lat/lon grid\n  (exact block means for 0.1°/0.25°/0.05°/1-arcmin sources; xESMF bilinear\n  fallback for irregular grids).\n- **Depth harmonization** — conversion to an approximate 0–1 m soil-moisture\n  equivalent (model-specific recipes using ancillary depth/landcover maps).\n- **Grid alignment** — snapping coordinates to the canonical ISIMIP 0.5°\n  land-mask grid (bit-identical lat/lon values).\n\n**Model preprocessing** (`build_models_1m.py`):\n\nEach of the 7 ISIMIP models has a dedicated depth recipe in\n`src/sm_attribution/preprocess/depth_1m.py`:\n\n| Model | Method |\n|-------|--------|\n| H08 | Scale by ancillary soil depth map: `f = min(1, D) / D` |\n| HydroPy | Root-zone mass pass-through |\n| JULES-W2 | Sum layers 1–3 |\n| MIROC-INTEG-LAND | Sum layers 1–3 |\n| WaterGAP2-2e | Scale by rooting depth from landcover ancillary |\n| WEB-DHM-SG | Scale by SiB2 landcover total depth ancillary |\n| LPJmL5-7-10-fire | Exact 0–1 m integration using `depth_bnds` |\n\n```bash\npython scripts/build_models_1m.py                         # all 7 models × 4 scenarios\npython scripts/build_models_1m.py --models h08 jules-w2   # subset\n```\n\n**Observation preprocessing** (`build_observed_1m.py`):\n\nEach of the 10 observational products has a tailored pipeline in\n`src/sm_attribution/preprocess/observations.py`:\n\n| Dataset | Native grid | Depth / variable | Notes |\n|---------|-------------|------------------|-------|\n| ERA5-Land | 0.1° | swvl1–3 → 0–1 m mass | Block mean 5×5 |\n| GLEAM v4.2a | 0.1° | SMrz (root-zone volumetric) | Block mean 5×5 |\n| GLEAM v4.2b | 0.1° | SMrz (root-zone volumetric) | Block mean 5×5 |\n| GLDAS v2.0 | 0.25° | Sum of 0–10/10–40/40–100 cm | Block mean 2×2 |\n| GLDAS v2.1 | 0.25° | Sum of 0–10/10–40/40–100 cm | Block mean 2×2 |\n| SoMo.ml | 0.25° | Layers 1–3 (0–50 cm), depth-weighted | Block mean 2×2 |\n| GRACE-DA-DM | 0.25° | Root-zone percentile (weekly→monthly) | Block mean 2×2 |\n| MERRA-2 LAND | 0.5°×0.625° | 5% SFMC + 95% RZMC → mass | Linear interp lon |\n| GDO-ENSMIA | 0.1° | Standardized anomaly (3rd dekad/month) | Block mean 5×5 |\n| GDO-SMIA | ~1 arcmin | Standardized anomaly (last dekad/month) | Block mean 30×30 |\n\n```bash\npython scripts/build_observed_1m.py --dataset era5-land\npython scripts/build_observed_1m.py --dataset gldas-v21\n```\n\nOutputs land under `data/models_1m/` and `data/observed_1m/` as compressed\nNetCDFs with standardized variable names (`soilmoist_1m` or\n`soilmoist_anom_std`).\n\n## Pipeline\n\nThe analysis runs in four sequential steps.  Steps 2–4 are independent of\neach other and only depend on the outputs of step 1.\n\n| Step | Script | What it does |\n|------|--------|--------------|\n| 1 | `orchestrate_ssi_drought_features.py` | Compute SSI (Standardized Soil-moisture Index) for 10 obs products × 7 ISIMIP models × 4 scenarios, then extract 12 drought features via theory-of-runs |\n| 2 | `batch_run_correlations.py` | Per-pixel Pearson temporal correlations (model SSI vs obs SSI/anomaly), plus multi-model mean |\n| 3 | `orchestrate_drought_feature_spatial_correlations.py` | Cos-lat weighted Spearman spatial correlations (Global + AR6 regions) between obs and model drought features |\n| 4 | `orchestrate_drought_feature_ar6_metrics.py` | AR6-aggregated regional metrics: `spearman_rank`, `pearson_z`, `rmse_iqr` |\n\nUse `run_all.sh` to run the full pipeline with resume support.\n\n## Parallelism\n\nTwo levels of parallelism are available (configured in `configs/settings.yml`):\n\n| Setting | What it controls | Default |\n|---------|-----------------|---------|\n| `dask.max_workers` | Dask worker processes (inner loop: per-pixel SSI/features) | `os.cpu_count() - 2` |\n| `dask.concurrent_models` | Outer-loop threads via `ThreadPoolExecutor` | `1` (serial) |\n| `dask.use_distributed` | `true` = LocalCluster (singleton); `false` = `scheduler=\"processes\"` | `false` |\n\n### Runtime output policy\n\nPipeline scripts now favor concise progress output during long server runs:\n\n- Third-party infrastructure chatter (`distributed`, `tornado`, `bokeh`) is\n  reduced to warnings/errors.\n- Non-actionable warning spam (chunk-splitting and empty-slice warnings) is\n  suppressed.\n- HDF5 C-level error-stack dumps (`HDF5-DIAG`) are silenced via\n  `H5Eset_auto2` and `HDF5_LOG_LEVEL=none` (these bypass Python's stderr).\n- Long loops report periodic counters (completed/total) instead of per-item\n  flood lines.\n- Real errors and tracebacks are still preserved.\n\nUse `run_all.sh` logs (`logs/step*.log`) for full run history.\n\n**Cluster recipe** (e.g. 64-core server):\n\n```yaml\n# configs/settings.yml\ndask:\n  max_workers: null          # or override with DASK_NUM_WORKERS env var\n  use_distributed: false     # MUST be false when concurrent_models \u003e 1\n  concurrent_models: 4       # 4 outer threads × ~15 inner workers = 60 cores\n```\n\n```bash\nexport DASK_NUM_WORKERS=15\n./run_all.sh\n```\n\n\u003e **Important:** When `concurrent_models \u003e 1`, set `use_distributed: false`.\n\u003e Otherwise all threads share one Dask cluster and effectively serialize.\n\n**Laptop recipe** (e.g. 8-core MacBook):\n\n```yaml\ndask:\n  max_workers: null\n  use_distributed: false\n  concurrent_models: 1\n```\n\n## Drought features reference\n\nThe 12 drought features extracted per pixel (theory-of-runs method):\n\n| Variable | Description |\n|----------|-------------|\n| `n_events` | Number of drought events |\n| `duration_mean` | Mean event duration (months) |\n| `duration_max` | Max event duration (months) |\n| `severity_mean` | Mean cumulative severity |\n| `severity_max` | Max cumulative severity |\n| `intensity_mean` | Mean peak intensity |\n| `intensity_max` | Max peak intensity |\n| `ttm10` | Time-to-Moderate: months until SSI ≤ −1.0 |\n| `tts15` | Time-to-Severe: months until SSI ≤ −1.5 |\n| `tte20` | Time-to-Extreme: months until SSI ≤ −2.0 |\n| `inter_arrival_mean` | Mean inter-arrival time (months) |\n| `inter_arrival_cv` | CV of inter-arrival time |\n\n## Project layout\n\n```\nsrc/sm_attribution/       library code (io, preprocess, metrics, analysis, viz)\nscripts/                  CLI scripts (pipeline entry points)\nnotebooks/                exploratory notebooks (call into src/)\ntests/                    unit tests\nconfigs/                  YAML configs (paths, settings, thresholds)\ndata/                     local data (gitignored)\nfigures/                  generated figures (gitignored)\nmatlab_code/              original MATLAB reference code (read-only)\ndocumentation/            docs and drafts (gitignored)\n```\n\n## Configuration\n\n- **`configs/settings.yml`** — SSI method parameters, Dask parallelism, depth\n  and grid settings.\n- **`configs/data_registry.yml`** — Path templates for all data products,\n  model/obs metadata, period definitions. All file paths are resolved through\n  this registry — no hardcoded absolute paths in the codebase.\n\n## Tests\n\n```bash\npython -m pytest tests/ -x -q\n```\n\n## License\n\nSee [LICENSE](LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthchilly%2Fsm_attribution","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthchilly%2Fsm_attribution","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthchilly%2Fsm_attribution/lists"}