https://github.com/cedadev/checksit
File-checking made simple
https://github.com/cedadev/checksit
Last synced: about 1 year ago
JSON representation
File-checking made simple
- Host: GitHub
- URL: https://github.com/cedadev/checksit
- Owner: cedadev
- License: bsd-3-clause
- Created: 2022-03-30T13:37:11.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2025-04-15T08:30:38.000Z (about 1 year ago)
- Last Synced: 2025-04-15T09:37:58.877Z (about 1 year ago)
- Language: Python
- Size: 3.48 MB
- Stars: 1
- Watchers: 7
- Forks: 1
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
- Authors: AUTHORS.rst
Awesome Lists containing this project
README
# checksit
[](https://checksit.readthedocs.io/en/latest)
File-checking made simple
## Installation
Create a venv, then install dependencies:
```
pip install -r requirements.txt
pip install -e .
```
## Usage
A brief description of how to use checksit is given here. For more detail, visit the [documentation site](https://checksit.readthedocs.io/en/latest).
checksit is comprised of four key components - [check](#checksit-check), [describe](#checksit-describe), [show-specs](#checksit-show-specs), and [summary](#checksit-summary)
## checksit check
Check file against a template.
### Basic Usage
```
checksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Checks format of file.
* checksit searches its template cache for a similar file to compare against
### Main Features
#### Define template
```
checksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Use `--template` flag to define a template to use
* Template can be in template-cache or any file user has access to
* Note: cdl files are a representation of a netCDF file, being the output from `ncdump -h` on the netCDF file
#### Map variable names
```
checksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc
```
* Allows mapping of variable name, for the case that the name of a variable is different between the file to be checked and the template
* Format - `-m =`
* Multiple mappings should be comma separated
#### Ignore attributes
```
checksit check --ignore-attrs=global_attributes:time_coverage_start,global_attributes:time_coverage_end,global_attributes:tracking_id /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc
```
* Define attributes to ignore in checking
#### Define additional rules for checking
```
checksit check --rules=global_attributes:id=rule-func:match-file-name:lowercase:no-extension /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc
```
* Check items against defined rules
* Format - `=:[:[:...]]`
* Four options for ``:
* `rule-func` - check item against a defined function, 4 options:
* `match-file-name` - item must be the same as the file name, allowing for formatting through `` - `lowercase`, `uppercase`, `no_extension` - example: `global_attributes:id=rule-func:match-file-name:lowercase:no-extension`
* `match-one-of` - item must be the same as one of the `` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:project=rule-func:match-one-of:"ukcp18|ukcp09"`
* `match-one-or-more-of` - item must be the same as one or more of the `` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:contact=rule-func:match-one-or-more-of:"ukcpproject@metoffice.gov.uk|UKCP Team|MOHC"`
* `string-of-length` - item must be the same length as given `` or greater if `+` is given at end of `` - example: `global_attributes:project=rule-func:string-of-length:10,global_attributes:contact=rule-func:string-of-length:100+`
* `type-rule` - check item is of type as defined in `` - example: `transverse_mercator:false_northing=type-rule:integer`
* `regex` - check item for regular expression match - example: `global_attributes:project=regex:ukcp18`
* `regex-rule` - check item matches pre-defined regex rule, name of which is given in ``
* current options are `integer`,`valid-email`,`valid-url`,`valid-url-or-na`,`match:vN.M`,`datetime`,`datetime-or-na`,`number`
### Additional Options
#### specs
```
checksit check --specs=ceda-base /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Checks file against a given specification. For more info, see [checksit show-specs](#checksit-show-specs)
#### auto-cache
```
checksit check --auto-cache --template=/badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/08/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_08_day_20671201-20681130.nc /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Create a cache of the given template to add to add to checksit's template_cache
#### verbose
```
checksit check --verbose /group_workspaces/jasmin2/ukcp18/incoming-astephen/ukcordex-example/tasmax_rcp85_land-rcm_uk_12km_EC-EARTH_r12i1p1_HIRHAM5_day_19801201-19901130.nc
```
* Print additional information
## checksit describe
```
checksit describe
```
* Prints docstring of rules that can be used in `checksit check --rules`
* Individual rules can be printed out, e.g. `checksit describe match-one-of`
## checksit show-specs
```
checksit show-specs
```
* Prints out specs for a given spec-id, e.g. `ceda-base`
* sped-ids are saved in checksit/specs/groups
## checksit summary
* Summarises output from a number of log files created through `checksit check`