{"id":19309960,"url":"https://github.com/cedadev/abcunit-cmip5-stats","last_synced_at":"2025-09-07T01:09:15.326Z","repository":{"id":51245652,"uuid":"224874908","full_name":"cedadev/abcunit-cmip5-stats","owner":"cedadev","description":"Processing Framework for calculating temporal statistics from CMIP5 (using ABCUnit)","archived":false,"fork":false,"pushed_at":"2021-05-19T08:30:15.000Z","size":6307,"stargazers_count":1,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-07-07T21:10:08.644Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cedadev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-29T15:03:01.000Z","updated_at":"2021-05-19T08:30:18.000Z","dependencies_parsed_at":"2022-08-31T05:10:32.323Z","dependency_job_id":null,"html_url":"https://github.com/cedadev/abcunit-cmip5-stats","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cedadev/abcunit-cmip5-stats","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fabcunit-cmip5-stats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fabcunit-cmip5-stats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fabcunit-cmip5-stats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fabcunit-cmip5-stats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cedadev","download_url":"https://codeload.github.com/cedadev/abcunit-cmip5-stats/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fabcunit-cmip5-stats/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273983452,"owners_count":25202166,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-06T02:00:13.247Z","response_time":2576,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T00:21:57.857Z","updated_at":"2025-09-07T01:09:15.278Z","avatar_url":"https://github.com/cedadev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# abcunit-cmip5-stats\nProcessing Framework for calculating temporal statistics from CMIP5 (using ABCUnit).\n\n## Overview ##\n\nThe name ABCunit corresponds to the 4 layers the workflow is split in to:\n1. A - all\n2. B - batch\n3. C - chunk\n4. The Unit\n\nThese scripts are an example of using the ABCUnit structure on CMIP5 data. This provides a repeatable and efficient workflow.\n\nThe statistics that can be calculated are the maximum, minimum and mean.\n\nThe models, ensembles and variables available can be found in `lib/defaults.py`\n\n## Structure ##\n\n1. Run all script = `run_all.py`\n* Calculates given statistic for chosen models, ensembles and variables\n* These are given as arguments on the command line\n* Statistic must be specified as an argument at the command line\n* Defaults to running for all models, ensembles and variables\n* For each model it runs the batch script\n\n2. Run batch script = `run_batch.py`\n* Calculates the statistic over one chosen model and chosen ensembles and variables\n* Defaults to all ensembles and variables\n* Statistic and model must be specified when run at the command line \n* Calls each ensemble as an argument for the chunk script\n* The chunk script is submitted to LOTUS via `bsub`\n\n3. Run chunk script = `run_chunk.py`\n* Typically run on lotus but can be run from the command line\n* Calculates the statistic for the specified model, ensemble and variables\n* Statistic, model and ensemble must all be specified if run from the command line\n* Defaults to all variables\n* For each variable it: \n    * Ignores if already calculated\n    * Finds the necessary files\n    * Checks the date range specified is valid\n    * Calculates the statistic and writes to output file\n    * Checks output file exists\n    \n## Example usage ##\n\nLog in to a JASMIN sci server:\n\n`ssh \u003cuser-id\u003e@jasmin-sci5.ceda.ac.uk`\n\n### Edit these files to match your setup: ###\n\n* `SETTINGS.py`\n* `setup-env.sh`\n\nClone this repository and make sure you are in the top level abcunit-cmip5-stats directory:\n\n* `https://github.com/cedadev/abcunit-cmip5-stats.git`\n* `cd abcunit-cmip5-stats`\n\n***First run the*** `setup-env.sh` ***script to setup up your environment.***\n\nOptions for models, ensembles and variables can be found in `lib/defaults.py`. Note that the model is defined by its institute/model combination.\n\nRunning the top level 'run all' script at the command line:\n\n`python run_all.py -s mean` \n\nRunning the 'run batch' script:\n\n`python run_batch.py -s mean -m MOHC/HadGEM2-ES`\n\nRunning the 'run chunk' script locally, instead of using LOTUS, which is how it is invoked in the batch script:\n\n`python run_chunk.py -s mean -m MOHC/HadGEM2-ES -e r1i1p1`\n\nIn each example, all other arguments are optional and can be included. \nFor example, to calculate the mean of only 2 variables for in one model and ensemble:\n\n`python run_chunk.py -s mean -m MOHC/HadGEM2-ES -e r1i1p1 -v rh ra`\n\n## Log Handling and Outputs ##\n\nCurrently the system supports two different ways to log successes and failures of jobs, output to the file system or using a database. One of these methods can be chosen in the settings file with the BACKEND variable.\n\n### File system logs ###\n\nIf using the file system log handler, log files will be outputted in the following way. Each job will result in a file with a success path or failure path in the following formats:\n\n`\u003ccurrent-directory\u003e/logs/success/\u003cstat\u003e/\u003cmodel\u003e/\u003censemble\u003e/\u003cvar_id\u003e`\n\n`\u003ccurrent-directory\u003e/logs/failure/\u003cfailure-type\u003e/\u003cstat\u003e/\u003cmodel\u003e/\u003censemble\u003e/\u003cvar_id\u003e`\n   \n* current-directory is the directory you are in when running the python scripts.\n* failure-type can be one of:\n   * bad_data - Empty file produced when no netCDF files could be found for the chosen arguments.\n   * bad_num - Empty file produced when the chosen date range is invalid for the chosen files.\n   * no_output - Empty file produced when the output file could not be generated.\n\n### Database logs ###\n\nIf using the database log handler, success / failure logs will be output into a relation (named 'results' by default) in the following format:\n\n| id                                    |   result |\n| :------------------------------------ | -------: |\n| mean.MOHC.HadGEM2-ES.r1i1p1.shrubFrac |  success |\n| mean.MOHC.HadGEM2-ES.r1i1p1.tran      | bad_data |\n| mean.MOHC.HadGEM2-ES.r1i1p1.treeFrac  |  success |\n\nTo use this handler you will need to do the following setup:\n\n1. Contact the JASMIN help desk and ask for a postgresql database to be made for you\n2. Create an environment variable called ABCUNIT_DB_SETTINGS which stores a connection string for `psycopg2`, for example:\n   * `export ABCUNIT_DB_SETTINGS=\"dbname=\u003cdatabase_name\u003e user=\u003cuser_name\u003e host=\u003chost_name\u003e password=\u003cdatabase_password\u003e\"`\n\nIts useful to setup the variable in a script separate from your public code. \n\n### Outputs ###\n\nLotus outputs are in the following format:\n\n`\u003ccurrent-directory\u003e/logs/lotus-outputs/\u003cstat\u003e/\u003cmodel\u003e/\u003censemble\u003e.out or \u003censemble\u003e.err`\n\nIf a job is successful, its corresponding netcdf4 file will be output to:\n\n`\u003cgws\u003e/\u003cuser\u003e/abcunit-outputs/\u003cvar_id\u003e.nc`\n\nWhere `gws` is a path to a group workspace and `user` is your username. Netcdf4 files are output to a group workspace to more effectively use storage space. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fabcunit-cmip5-stats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcedadev%2Fabcunit-cmip5-stats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fabcunit-cmip5-stats/lists"}