{"id":19310014,"url":"https://github.com/cedadev/cmip6-object-store","last_synced_at":"2025-10-31T07:40:38.653Z","repository":{"id":44634600,"uuid":"294667524","full_name":"cedadev/cmip6-object-store","owner":"cedadev","description":"CMIP6 Object Store Library","archived":false,"fork":false,"pushed_at":"2022-02-03T09:39:19.000Z","size":32222,"stargazers_count":4,"open_issues_count":41,"forks_count":4,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-02T00:24:14.794Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cedadev.png","metadata":{"files":{"readme":"README-cmip6-zarr.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-11T10:41:12.000Z","updated_at":"2021-10-18T15:03:50.000Z","dependencies_parsed_at":"2022-09-25T00:23:23.041Z","dependency_job_id":null,"html_url":"https://github.com/cedadev/cmip6-object-store","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fcmip6-object-store","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fcmip6-object-store/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fcmip6-object-store/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fcmip6-object-store/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cedadev","download_url":"https://codeload.github.com/cedadev/cmip6-object-store/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250249039,"owners_count":21399383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T00:22:14.272Z","updated_at":"2025-10-31T07:40:33.609Z","avatar_url":"https://github.com/cedadev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Notes on cmip6 zarr work\n\n## How do we decide on the chunk size?\n\nWe did some testing of efficient object sizes in our object store (Caringo) -\nit suggested 100Mb - 1Gb was the optimum size. So we have set 250Mb as our\ntarget value. Depending on the array shape, our chunks should come in around\n250Mb.\n\n## Which data frequencies have we covered?\n\nSo far, we have the following in our store:\n\nAERday, Amon, CFday, day, Eday, LImon, Lmon, Oday, Omon, Primday\n\n## Why develop a package rather than a notebook application?\n\nI see this task as a batch processing task. I want to be able to\nsay \"run everything\" and let the task manage itself. This is very\nhard to do, so in reality I need lots of ways to catch failures etc.\n\nThese include:\n - modifying the number of HTTP connections in the pool:\n   https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/cmip6_zarr/caringo_store.py#L21\n - doing retries on a number of operations, to overcome temporary failures:\n   https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/cmip6_zarr/caringo_store.py#L24\n - limiting the total memory used so that each task will fit into the\n   memory limit on each batch node:\n   https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/etc/config.ini#L32 (this is how we avoid memory errors)\n - each variable is processed separately to keep the volumes relatively\n   low writing all successes to an internal catalogue (i.e. log), so that\n   the framework will not rerun tasks that were previously successful:\n   https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/cmip6_zarr/zarr_writer.py#L122-L124\n - log failures to a separate catalogue (log) so that they can be interrogated\n   and run interactively for debugging:\n   https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/cmip6_zarr/zarr_writer.py#L126-L130\n\nOverall, in my view, this kind of large-scale processing does not fit well with notebooks.\nI would rather produce a notebook to interact with the Zarr store, on the user side:\n\nhttps://github.com/cedadev/cmip6-object-store/blob/master/notebooks/cmip6-zarr-jasmin.ipynb\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fcmip6-object-store","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcedadev%2Fcmip6-object-store","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fcmip6-object-store/lists"}