{"id":44079447,"url":"https://github.com/pacificclimate/climate-explorer-data-prep","last_synced_at":"2026-02-08T08:34:37.201Z","repository":{"id":44934001,"uuid":"100539681","full_name":"pacificclimate/climate-explorer-data-prep","owner":"pacificclimate","description":null,"archived":false,"fork":false,"pushed_at":"2023-12-15T23:56:30.000Z","size":24717,"stargazers_count":0,"open_issues_count":28,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-12-16T00:49:29.032Z","etag":null,"topics":["actions","pipenv","pypi"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pacificclimate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-08-16T23:01:36.000Z","updated_at":"2023-12-16T00:49:33.152Z","dependencies_parsed_at":"2023-12-16T00:59:40.054Z","dependency_job_id":null,"html_url":"https://github.com/pacificclimate/climate-explorer-data-prep","commit_stats":null,"previous_names":[],"tags_count":19,"template":null,"template_full_name":null,"purl":"pkg:github/pacificclimate/climate-explorer-data-prep","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fclimate-explorer-data-prep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fclimate-explorer-data-prep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fclimate-explorer-data-prep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fclimate-explorer-data-prep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pacificclimate","download_url":"https://codeload.github.com/pacificclimate/climate-explorer-data-prep/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fclimate-explorer-data-prep/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29225478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-08T06:05:31.539Z","status":"ssl_error","status_checked_at":"2026-02-08T05:58:33.853Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions","pipenv","pypi"],"created_at":"2026-02-08T08:34:36.668Z","updated_at":"2026-02-08T08:34:37.193Z","avatar_url":"https://github.com/pacificclimate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PCIC Climate Explorer Data Preparation Tools\n\n[![Code Climate](https://codeclimate.com/github/pacificclimate/climate-explorer-data-prep/badges/gpa.svg)](https://codeclimate.com/github/pacificclimate/climate-explorer-data-prep)\n![Python CI](https://github.com/pacificclimate/climate-explorer-data-prep/workflows/Python%20CI/badge.svg)\n![Publish Python Package](https://github.com/pacificclimate/climate-explorer-data-prep/workflows/Publish%20Python%20Package/badge.svg)\n\n## Historical note\n\nPrior to 2017 Aug 17, these scripts were part of the\n[Climate Explorer backend](https://github.com/pacificclimate/climate-explorer-backend).\n\nThese scripts are now a separate project with their own repository (this one).\nA full commit history of the data prep scripts was retained during the migration to this repo.\nMost (but, mysteriously, not quite all) of the commit history for non-data prep code was pruned during migration.\n\nNo releases in the original CE backend specifically related to or documented changes to these scripts,\nso this project starts with release version 0.1.0.\n\n## Installation\n\nClone the repo onto the target machine.\n\nIf installing on a PCIC compute node, you must load the environment modules that data prep depends on\n_before_ installing the Python modules:\n\n```bash\n$ module load netcdf-bin\n$ module load cdo-bin\n$ module load poetry\n```\n\nPython installation should be done in a virtual environment managed by\nthe [`poetry` tool](https://python-poetry.org/docs/):\n\n```bash\n$ poetry install # Or\n$ poetry install --with=dev # to include development packages\n```\n\nThis installs the scripts described below.\nTo make their command-line invocation a little nicer, the scripts  lack the `.py` extension.\nThey are, however, Python code.\n\nAll of the scripts below can be run with `poetry run [script_name]`,\nor simply `[script_name]` if one has already invoked a shell in which\nthe project is installed (accomplished with `poetry shell`).\n\n## Development\n\n### Testing\n\nLocal testing, prior to pushing to Github (and running the Github\nActions) can simply be done by invoking:\n\n```bash\npoetry run pytest\n```\n\n### Releasing\n\nTo create a versioned release:\n\n1. Increment `__version__` in `pyproject.toml`\n2. Summarize the changes from the last release in `NEWS.md`\n3. Commit these changes, then tag the release:\n\n  ```bash\ngit add pyproject.toml NEWS.md\ngit commit -m\"Bump to version x.x.x\"\ngit tag -a -m\"x.x.x\" x.x.x\ngit push --follow-tags\n  ```\n4. [Github Actions](https://github.com/pacificclimate/climate-explorer-data-prep/blob/i130-full-actions/.github/workflows/pypi-publish.yml) will automatically build and publish the package to our pypi server\n\n## Scripts\n\n### `generate_climos`: Generate climatological means\n\n#### Purpose\n\nTo generate files containing climatological means from input files of daily, monthly, or yearly data that adhere to the\n[PCIC metadata standard ](https://pcic.uvic.ca/confluence/display/CSG/PCIC+metadata+standard+for+downscaled+data+and+hydrology+modelling+data)\n(and consequently to CMIP5 and CF standards).\n\nMeans are formed over the time dimension; the spatial dimensions are preserved.\n\nOutput can optionally be directed into separate files for each variable and/or each averaging interval\n(month, season, year).\n\nThis script:\n\n1. Opens an existing NetCDF file\n\n2. Determines what climatological periods to generate\n\n3. For each climatological period:\n\n    a. Aggregates the daily data for the period into a new climatological output file.\n\n    b. Revises the time variable of the output file to meet CF1.6/CMIP5 specification.\n\n    c. Adds a climatology_bounds variable to the output file match climatological period.\n\n    d. Optionally splits the climatology file into one file per dependent variable in the input file.\n\n    e. Uses PCIC standards-compliant filename(s) for the output file(s).\n\nAll input file metadata is obtained from standard metadata attributes in the netCDF file.\nNo metadata is deduced from the filename or path.\n\nAll output files contain PCIC standard metadata attributes appropriate to climatology files.\n\n#### Usage\n\n```bash\n# Dry run\ngenerate_climos --dry-run -o outdir files...\n\n# Use defaults:\ngenerate_climos -o outdir files...\n\n# Split output into separate files per dependent variable and per averaging interval\ngenerate_climos --split-vars --split-intervals -o outdir files...\n```\n\nUsage is further detailed in the script help information: `generate_climos -h`\n\n#### PCIC Job Queueing tool for processing many / large files\n\nFor several reasons -- file copying, computation time, record-keeping, etc. -- it's inadvisable to run\n`generate_climos` from the command line for many and/or large input files.\nFortunately there is a tool to support this kind of processing and record-keeping:\n[PCIC Job Queueing](https://github.com/pacificclimate/jobqueueing).\n\n### `split_merged_climos`: Split climo means files into per-interval files (month, season, year)\n\n#### Purpose\n\nEarly versions of the `generate_climos` script (and its R predecessor) created output files containing\nmeans for all intervals (month, season, year) concatenated into a single file. This is undesirable\nfor a couple of reasons:\n\n* Pragmatic: `ncWMS2` rejects NetCDF files with non-monotonic dimensions.\n  Merged files have a non-monotonic time dimension.\n\n* Formal: The 3 different means, i.e., means over 3 different intervals (month, season, year),\n  are formally different estimates of random variables with different time dimensions.\n  We could represent this easily enough in a single NetCDF file, with 3 distinct variables\n  each with a distinct time dimension, but judged it as introducing too much complication.\n  We prefer to have a separate file per averaging interval, with one time dimension per file.\n\nThis script takes as input one or more climo means files and splits each into separate files,\none file per mean interval (month, season, year) in the input file.\n\nThe input file is not modified.\n\n#### Usage\n\n```bash\nsplit_merged_climos -o outdir files...\n```\n\nFilenames are automatically generated for the split files.\nThese filenames conform to the extended CMOR syntax defined in the\n[PCIC metadata standard ](https://pcic.uvic.ca/confluence/display/CSG/PCIC+metadata+standard+for+downscaled+data+and+hydrology+modelling+data).\n\nIf the input file is named according to standard, then the new filenames are the same as the input filename,\nwith the `\u003cfrequency\u003e` component (typically `msaClim`)\nreplaced with the values `mClim` (monthly means), `sClim` (seasonal means), `aClim` (annual means).\n\nOutput files are placed in the directory specified in the `-o` argument.\nThis directory is created if it does not exist.\n\n### `update_metadata`: Update metadata in a NetCDF file\n\nSome NetCDF files have improper metadata: missing, invalid, or incorrectly named global or variable metadata\nattributes. There are no really convenient tools for updating metadata, so we rolled our own, `update_metadata`.\n\n#### Usage\n\n```bash\n# update metadata in ncfile according to instructions in updates\nupdate_metadata -u updates ncfile\n```\n\n`update_metadata` takes an option (`-u`) and an argument:\n\n* `-u`: the filepath of an updates file that specifies what to do to the metdata it finds in the NetCDF file\n* argument: the filepath of a NetCDF file to update\n\n#### Updates file: specifying updates to make\n\n`update_metadata` can update the global attributes and/or the attributes of variables in a NetCDF file.\nThree update operations are available (detailed below): delete attribute, set attribute value, rename attribute.\n\nUpdates to be made are specified in a separate updates file.\nIt uses a simple, human-readable data format called [YAML](https://en.wikipedia.org/wiki/YAML).\nYou only need to know a couple of things about YAML and how we employ it to use this script:\n\n* Updates are specified with `key: value` syntax. A space must separate the colon from the value.\n* Indentation matters (see next item). Indentation must be consistent within a block.\n* There are two levels of indentation.\n  * The first (unindented) level specifies what group of attributes is to be updated.\n    * The key `global` specifies global attributes.\n    * Any other key is assumed to be the name of a variable whose attributes are to be updated.\n    * The *value* for a first-level key is the indented block below it.\n  * The second (indented) level specifies the attribute and the change to be made to it.\n    See below for details.\n    * If you care about the order that attributes are processed in (and will appear in any naive\n      listing of the attributes), prefix all of the second-level items with a dash (-). This causes\n      the attributes to be processed in the order listed in the updates file.\n\n##### Delete attribute\n\nDelete the attribute named `name`.\n\n```yaml\nglobal-or-variable-name:\n    name:\n```\n\nor (to process in order)\n\n```yaml\nglobal-or-variable-name:\n    - name:\n```\n\n\n##### Set attribute to simple value\n\nSet the value of the attribute `name` to `value`. If the attribute does not yet exist, it is created.\n\n```yaml\nglobal-or-variable-name:\n    name: value\n```\n\nor (to process in order)\n\n```yaml\nglobal-or-variable-name:\n    - name: value\n```\n\nNote: This script is clever (courtesy of YAML cleverness) about the data type of the value specified.\n\n* If you provide a value that looks like an integer, it is interpreted as an integer.\n* If you provide a value that looks like a float, it is interpreted as a float.\n* Otherwise it is treated as a string.\n  If you need to force a numeric-looking value to be a string, enclose it in single or double quotes (e.g., `'123'`).\n\nMore details on the [Wikipedia YAML page](https://en.wikipedia.org/wiki/YAML#Advanced_components).\n\n##### Set attribute to value of Python expression\n\nSet the value of the attribute `name` to the value of the Python expression `expression`, evaluated in a\ncontext that includes the values of all NetCDF attributes as variables, and with a selection of\nadditional custom functions available.\n\nAll standard Python functions are available -- including dangerous ones like `os.remove`,\nso don't get too clever.\n\nFor convenience, the values of all attributes of the target object are made available as local variables\nin the execution context. For example, the attribute named `product` in the global attribute set can be\naccessed in the expression as the variable `product`. It can be used just like any variable in any valid\nPython expression.\n\nFor example, if the `initialization_method` is given as `i1` or `i2` instead of the standard `1` or `2`,\nthe `realization` as `r2` instead of `2` and the `physics_version` as `p1` instead of `1`, and so on,\nthese lines would trim the extra characters from these values:\n```yaml\nglobal:\n  initialization_method: =initialization_method.strip('i')\n  realization: =realization.strip('r')\n  physics_version: =physics_version.strip('p')\n```\n\nThe following custom functions are available for use in expressions:\n\n* `parse_ensemble_code(ensemble_code)`: Parse the argument as an ensemble code (`r\u003cm\u003ei\u003cn\u003ep\u003cl\u003e`) and return\n  a dict containing the values of each component, appropriately named as follows:\n    ```\n    {\n        'realization': \u003cm\u003e,\n        'initialization_method': \u003cn\u003e,\n        'physics_version': \u003cl\u003e,\n    }\n    ```\n\nIf an exception is raised during evaluation of an expression, the target attribute is not set,\nan error message is printed, and processing of the remaining unprocessed updates continues.\n\nIf the attribute does not yet exist, it is created.\n\n```yaml\nglobal-or-variable-name:\n    name: =expression\n```\n\nor (to process in order)\n\n```yaml\nglobal-or-variable-name:\n    - name: =expression\n```\n\n##### Rename attribute\n\nRename the attribute named `oldname` to `newname`. Value is unchanged.\n\n```yaml\nglobal-or-variable-name:\n    newname: \u003c-oldname\n```\n\nor (to process in order)\n\n```yaml\nglobal-or-variable-name:\n    - newname: \u003c-oldname\n```\n\nNote: The special sequence `\u003c-` after the colon indicates renaming.\nThis means that you can't set an attribute with a value that begins with `\u003c-`. Sorry.\n\n##### Example updates file:\n\n```yaml\nglobal:\n    foo:\n    bar: 42\n    baz: \u003c-qux\n\ntemperature:\n    units: degrees_C\n```\n\nor (to process in order)\n\n```yaml\nglobal:\n    - foo:\n    - bar: 42\n    - baz: \u003c-qux\n\ntemperature:\n    - units: degrees_C\n```\n\nThis file causes a NetCDF file to be updated in the following way:\n\nGlobal attributes:\n* delete global attribute `foo`\n* set global attribute `bar` to (integer) `42`\n* rename global attribute `qux` to `baz`\n\nAttributes of variable named `temperature`:\n* set attribute `units` to (string) `degrees_C`\n\n### `decompose_flow_vectors`: create normalized unit vector fields from a VIC routing file\n\n#### Purpose:\nncWMS can display vector fields as map rasters, if the vector data is arranged inside the netCDF file as two grids, one representing the eastward vectors at each grid location, the other representing northward vectors at each grid location.\n\nVIC parametrization files encode flow direction using a number from 1 to 8. This script decomposes the flow direction vectors in a VIC parametrization file into northward and eastward vector arrays for ncWMS display.\n\nVIC routing directional vector values:\n```\n1 = North\n2 = Northeast\n3 = East\n4 = Southeast\n5 = South\n6 = Southwest\n7 = West\n8 = Northwest\n9 = Outlet of stream or river\n```\n\n#### Usage:\n`decompose_flow_vectors.py infile outfile variable`\n\nWrites to `outfile` a netCDF containing normalized vector arrays generated from `variable` in `infile`. Does not change `infile` or copy any other variables or axes to `outfile`.\n\n### `generate_prsn`: Generate snowfall file\n\n#### Purpose:\n\nTo generate a file containing the `snowfall_flux` from input files of precipiation, tasmin and tasmax.  \n\n#### Usage:\n\n```bash\n# Dry run\ngenerate_prsn --dry-run -p prec_file -n tasmin_file -x tasmax_file -o outdir\n\n# File generation\ngenerate_prsn -p prec_file -n tasmin_file -x tasmax_file -o outdir\n```\n\n## Indexing climatological output files\n\nIndexing is done using scripts in the [modelmeta](https://github.com/pacificclimate/modelmeta) package.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpacificclimate%2Fclimate-explorer-data-prep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpacificclimate%2Fclimate-explorer-data-prep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpacificclimate%2Fclimate-explorer-data-prep/lists"}