{"id":19309921,"url":"https://github.com/cedadev/checksit","last_synced_at":"2025-04-22T13:33:44.216Z","repository":{"id":38337882,"uuid":"475894902","full_name":"cedadev/checksit","owner":"cedadev","description":"File-checking made simple","archived":false,"fork":false,"pushed_at":"2025-04-15T08:30:38.000Z","size":3654,"stargazers_count":1,"open_issues_count":16,"forks_count":1,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-15T09:37:58.877Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cedadev.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-30T13:37:11.000Z","updated_at":"2024-12-20T14:14:16.000Z","dependencies_parsed_at":"2025-04-15T09:38:19.965Z","dependency_job_id":null,"html_url":"https://github.com/cedadev/checksit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fchecksit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fchecksit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fchecksit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fchecksit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cedadev","download_url":"https://codeload.github.com/cedadev/checksit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250249033,"owners_count":21399381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T00:21:38.590Z","updated_at":"2025-04-22T13:33:42.238Z","avatar_url":"https://github.com/cedadev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# checksit\n\n[![Documentation Status](https://readthedocs.org/projects/checksit/badge/?version=latest)](https://checksit.readthedocs.io/en/latest)\n\nFile-checking made simple\n\n## Installation\n\nCreate a venv, then install dependencies:\n\n```\npip install -r requirements.txt\npip install -e .\n```\n\n\n## Usage\n\nA brief description of how to use checksit is given here. For more detail, visit the [documentation site](https://checksit.readthedocs.io/en/latest).\n\nchecksit is comprised of four key components - [check](#checksit-check), [describe](#checksit-describe), [show-specs](#checksit-show-specs), and [summary](#checksit-summary)\n\n\n## checksit check\n\nCheck file against a template.\n\n### Basic Usage\n\n```\nchecksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc\n```\n* Checks format of file.\n* checksit searches its template cache for a similar file to compare against\n\n\n### Main Features\n\n#### Define template\n```\nchecksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc\n```\n* Use `--template` flag to define a template to use\n* Template can be in template-cache or any file user has access to\n* Note: cdl files are a representation of a netCDF file, being the output from `ncdump -h` on the netCDF file\n\n\n#### Map variable names\n```\nchecksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc\n```\n* Allows mapping of variable name, for the case that the name of a variable is different between the file to be checked and the template\n* Format - `-m \u003ctemplate variable name\u003e=\u003cfile variable name\u003e`\n* Multiple mappings should be comma separated \n\n\n#### Ignore attributes\n```\nchecksit check --ignore-attrs=global_attributes:time_coverage_start,global_attributes:time_coverage_end,global_attributes:tracking_id /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc\n```\n* Define attributes to ignore in checking\n\n\n#### Define additional rules for checking\n```\nchecksit check --rules=global_attributes:id=rule-func:match-file-name:lowercase:no-extension /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc\n```\n* Check items against defined rules\n* Format - `\u003cwhat to check\u003e=\u003crule type\u003e:\u003cfunction/check\u003e[:\u003cextras\u003e[:\u003cextras\u003e...]]`\n* Four options for `\u003crule type\u003e`:\n  * `rule-func` - check item against a defined function, 4 options:\n    * `match-file-name` - item must be the same as the file name, allowing for formatting through `\u003cextras\u003e` - `lowercase`, `uppercase`, `no_extension` - example: `global_attributes:id=rule-func:match-file-name:lowercase:no-extension`\n    * `match-one-of` - item must be the same as one of the `\u003cextras\u003e` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:project=rule-func:match-one-of:\"ukcp18|ukcp09\"`\n    * `match-one-or-more-of` - item must be the same as one or more of the `\u003cextras\u003e` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:contact=rule-func:match-one-or-more-of:\"ukcpproject@metoffice.gov.uk|UKCP Team|MOHC\"`\n    * `string-of-length` - item must be the same length as given `\u003cextra\u003e` or greater if `+` is given at end of `\u003cextra\u003e` - example: `global_attributes:project=rule-func:string-of-length:10,global_attributes:contact=rule-func:string-of-length:100+`\n  * `type-rule` - check item is of type as defined in `\u003cextra\u003e` - example: `transverse_mercator:false_northing=type-rule:integer`\n  * `regex` - check item for regular expression match - example: `global_attributes:project=regex:ukcp18`\n  * `regex-rule` - check item matches pre-defined regex rule, name of which is given in `\u003cextra\u003e`\n    * current options are `integer`,`valid-email`,`valid-url`,`valid-url-or-na`,`match:vN.M`,`datetime`,`datetime-or-na`,`number`\n\n\n### Additional Options\n\n#### specs\n```\nchecksit check --specs=ceda-base /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc\n```\n* Checks file against a given specification. For more info, see [checksit show-specs](#checksit-show-specs)\n\n\n#### auto-cache\n```\nchecksit check --auto-cache --template=/badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/08/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_08_day_20671201-20681130.nc /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc\n```\n* Create a cache of the given template to add to add to checksit's template_cache\n\n\n#### verbose\n```\nchecksit check --verbose /group_workspaces/jasmin2/ukcp18/incoming-astephen/ukcordex-example/tasmax_rcp85_land-rcm_uk_12km_EC-EARTH_r12i1p1_HIRHAM5_day_19801201-19901130.nc\n```\n* Print additional information\n\n\n\n## checksit describe\n\n```\nchecksit describe\n```\n* Prints docstring of rules that can be used in `checksit check --rules`\n* Individual rules can be printed out, e.g. `checksit describe match-one-of`\n\n\n\n## checksit show-specs\n\n```\nchecksit show-specs \u003cspec-id\u003e\n```\n* Prints out specs for a given spec-id, e.g. `ceda-base`\n* sped-ids are saved in checksit/specs/groups\n\n\n\n## checksit summary\n\n* Summarises output from a number of log files created through `checksit check`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fchecksit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcedadev%2Fchecksit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fchecksit/lists"}