{"id":44079443,"url":"https://github.com/pacificclimate/nchelpers","last_synced_at":"2026-02-08T08:34:37.258Z","repository":{"id":46783252,"uuid":"83344997","full_name":"pacificclimate/nchelpers","owner":"pacificclimate","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-28T16:46:01.000Z","size":16729,"stargazers_count":0,"open_issues_count":13,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-28T17:50:32.665Z","etag":null,"topics":["actions","pip","pypi"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pacificclimate.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-02-27T18:53:38.000Z","updated_at":"2022-08-22T15:17:33.000Z","dependencies_parsed_at":"2024-09-14T14:50:10.952Z","dependency_job_id":"75228fd1-a417-4628-b2fd-31aee35e53f8","html_url":"https://github.com/pacificclimate/nchelpers","commit_stats":null,"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/pacificclimate/nchelpers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fnchelpers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fnchelpers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fnchelpers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fnchelpers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pacificclimate","download_url":"https://codeload.github.com/pacificclimate/nchelpers/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pacificclimate%2Fnchelpers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29225478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-08T06:05:31.539Z","status":"ssl_error","status_checked_at":"2026-02-08T05:58:33.853Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions","pip","pypi"],"created_at":"2026-02-08T08:34:36.487Z","updated_at":"2026-02-08T08:34:37.253Z","avatar_url":"https://github.com/pacificclimate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# nchelpers\n\n![Python CI](https://github.com/pacificclimate/nchelpers/workflows/Python%20CI/badge.svg)\n![Pypi Publishing](https://github.com/pacificclimate/nchelpers/workflows/Pypi%20Publishing/badge.svg)\n\nThis module contains the `CFDataset` class, which extends the `netcdf4.Dataset`\nclass to provide additional properties, memory-efficient data access, and improved\nerror handling for netCDF files that comply with the [CF Metadata Conventions](http://cfconventions.org/)\nand the PCIC metadata conventions that extend them.\n\nIt supports several PCIC tools that work with netCDF files that adhere to the\nCF and PCIC metadata conventions. The class provides several properties that\nspecify information about a file's contents and metadata and can be used to\nguide data processing. It does not provide any new tools to directly\nmodify netCDF files, but all file-modifying procedures in the netcdf4.Dataset\nclass are still available.\n\n## Data chunking\n\n`iteration.py` contains generators for iterating over a netCDF file and loading\non chunk at a time so that enormous files can be read without a `MemoryError`.\n\n## PCIC Metadata Model\n\nPCIC has a [process-oriented metadata model](https://pcic.uvic.ca/confluence/display/CSG/PCIC+metadata+standard+for+downscaled+data+and+hydrology+modelling+data).\n\nData originates as either model output (simulated by a Global Climate Model\nor Regional Climate Model) or observations (measured directly in some fashion).\n\nThe data can then be used as input to one or more further processes. Each\nnew process preserves all the metadata describing the data origin and previous\nprocess. When a new dataset B is generated from a process that uses dataset\nA as input, all metadata attributes describing A's generation will be\npresent in B, prepended with a prefix that refers to A's role in generating\nB.\n\nFor example, suppose you have a model output dataset A, with a metadata attribute\ngiving the name of the generating model, `example_model`.\n\nA has the metadata attribute `model_id` with the value `example_model`.\n\nIf A is used as input to a downscaling process, the output dataset B will\nhave an attribute called `GCM__model_id` with the value `example_model`. A\nis used as the GCM (global climate model) intput to the downscaling process,\nso the prefix `GCM` is used.\n\nIf B is further used as input to a hydrological modeling process, the output\ndataset C will have an attribute called `downscaling__GCM__model_id` with the\nvalue `example_model`. B is a downscaled dataset used as forcing data for the\nhydrological model, so its attributes are prepended with `downscaling`, including\nthe attributes it inherited from A to show its own inputs.\n\nThe metadata preserves the entire chain of processes followed\nto create any given dataset so that its origin can always be traced and\nrecreated.\n\nThe functions in this module handle determining what sort of data a particular\nnetCDF is, which processes were used to generate it, validating that required\nmetadata is present, and navigating the metadata \"tree\" to find desired metadata.\n\n## Data Supported\n\nMost of the time, this module will take care of the low level details related\nto handling various types of datasets. Data is usually cubes with a latitude,\nlongitude, and time dimension. While it may have different origins and different\norigin- or process- specific metadata, the module should seamlessly traverse the\nmetadata formats of various different data types and provide a unified interface\nto accessing needed metadata.\n\n### Supported Data Origins\n\n#### Model Output\n\nModel output is the majority of netCDF data used by PCIC. Model output data has\nlatitude, longitude, and time dimensions and metadata attributes specifying the\nmodel, scenario, and run used to generate the data.\n\nModel data that has not been further processed has the `is_unprocessed_gcm_output`\nproperty of `True`. Data that is either model output or was created by processes\nthat used model output has the `is_gcm_derivative` property of `True`.\n\n#### Observations\n\nObservation data is historical data that is derived from real world observations\nand then extrapolated to cover geographic or chronological gaps by an algorithmic\nprocess. (This module and the netCDF file format are not well suited for handling\nsparse, non-gridded observation data.)\n\nNote that, confusingly, observation data usually _does_ have a `model_id` attribute:\ntypically this is the name of the algorithm used to extrapolate measurements to\ncover an entire grid. It is not a Global Climate Model, though, and simulation\nattributes relevant to GCMs, like `experiment`, will not be present.\n\nObservational data values usually, but not always, takes the form of a cube\nwith lat, lon, and time dimensions, similar to model output.\n\nObservation data has the `is_gridded_obs` property of `True`.\n\n### Data-generating Processes\n\n#### Downscaling\n\nThis process produces data with a higher spatial resolution, but otherwise\nsimilar to the input data. It is only run on model output data; observation data\nis already downscaled by the extrapolation process used to create it.\n\nIt will have the property `is_downscaled_output` of `True` and metadata\nspecifying the downscaling algorithm (typically either BCCAQ, PRISM, or both).\n\n#### Climdex calculation\n\nThis process takes model output and calculates [various derived statistics](https://www.climdex.org/)\nabout it. The output data will have the same dimensions as the input data\n(lat, lon, time), but a different variable.\n\nAll climdex datasets have the property `is_climdex_output` set to `True`, and\none of `is_climdex_gcm_output` or `is_climdex_ds_gcm_output` will be `True`\nas well, depending on whether the input dataset was downscaled or not.\n\n#### Hydrological Modeling\n\nUnlike Downscaling or Climdex calculation, hydrological modeling produces\ndata that is _not_ a cube with lat, lon, and time dimensions, and applications\nthat use this module to work with streamflow data will definitely need to\ncheck whether the data is streamflow and handle it seperately if so.\n\nThe hydrological model takes a downscaled model output or gridded\nobservation dataset as input, and outputs streamflow at one particular\nlocation. The resulting dataset has a `True` `is_streamflow_model_output`\nproperty.\n\n### Supported Data Shapes\n\n#### Raster Timeseries\n\nThe most common type of PCIC data is a raster timeseries. Data is stored in one or\nmore data cubes with latitude, longitude, and time dimensions. This is the default and\ndoesn't usually require explicit handling, but can be checked for if needed.\n\nThe `sampling_geometry` property will have the value `gridded` and the `time_invariant`\nproperty will be `False`.\n\n#### Climatologies\n\nA subset of raster timeseries; a climatology contains values that are averaged over a\nmulti-year time period, typically 30 years. Climatologies may contain annual data\n(one timestamp), seasonal data (four timestamps), monthly data (12 timestamps) or\nsome combination of those time resolutions. For example, a January timestamp would\nrepresent the average of all Januaries occuring over the time period.\nIt has a `climatology_bounds_value` property specifying the period over which each\nvalue is averaged.\n\nA climatology will return `True` on the `is_multi_year` property.\n\n#### Discrete Structured Geometries\n\nDiscrete Structured Geometries have a time series of data associated with\none or more specific points (like measuring stations), but not a full grid.\nThe collection of individual points is the \"instance\" dimension; data is\nstored in a rectanlge with dimensions corresponding to \"instance\" and \"time\".\nIt has an `instance_dim` property and an `id_instance_var` property in\naccordance with the CF Standards for DSG data. The list of instance variables\nis available in the `coordinate_vars` property.\n\nA discrete structured geometry has a value other than `gridded` as its\n`sampling_geometry` property.\n\n#### Time Invariant Data\n\nTime invariant data is gridded data that describes characteristics that do not change\nover time, like elevation or soil type. Time Invariant Data is always observations;\nclimate model output necessarily has a time component. It lacks a time dimension.\n\nA time-invariant dataset returns `True` on the `is_time_invariant` property.\nMost time-related properties will throw errors if accessed on a time-invariant\ndataset.\n\n## Building and Testing\n\nWhile this module is usually imported to some other project, it can be built and\ntested on its own for debugging or development. Requires Poetry \u003e= 2.0.0.\n\n```\ngit clone http://github.com/pacificclimate/nchelpers\ncd nchelpers\npoetry install\n# Tests can be run with `pytest`.\npoetry run pytest\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpacificclimate%2Fnchelpers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpacificclimate%2Fnchelpers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpacificclimate%2Fnchelpers/lists"}