{"id":15359593,"url":"https://github.com/jhsmit/cookiecutter-reproducible-analysis","last_synced_at":"2025-10-24T11:47:11.269Z","repository":{"id":99380629,"uuid":"498346464","full_name":"Jhsmit/cookiecutter-reproducible-analysis","owner":"Jhsmit","description":"Cookiecutter for creating a project structure facilitating reproducible analysis ","archived":false,"fork":false,"pushed_at":"2025-03-17T11:54:04.000Z","size":2379,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-17T12:43:27.779Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jhsmit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-31T13:24:33.000Z","updated_at":"2025-03-17T11:54:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"406c26fe-e8ca-4e47-9909-56bd2f1748ae","html_url":"https://github.com/Jhsmit/cookiecutter-reproducible-analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jhsmit%2Fcookiecutter-reproducible-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jhsmit%2Fcookiecutter-reproducible-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jhsmit%2Fcookiecutter-reproducible-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jhsmit%2Fcookiecutter-reproducible-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jhsmit","download_url":"https://codeload.github.com/Jhsmit/cookiecutter-reproducible-analysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247218380,"owners_count":20903240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-01T12:45:20.330Z","updated_at":"2025-10-24T11:47:11.178Z","avatar_url":"https://github.com/Jhsmit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cookiecutter-reproducible-analysis\n\nCookiecutter for creating a project structure facilitating reproducible analysis.\n\n\n\n## The resulting directory structure\n\nThe directory structure of your new project looks like this: \n\n```\n\n├── LICENSE\n├── README.md                   \u003c- The top-level README for developers using this project.\n│           \n├── ava                         \u003c- Source code for use in this project.\n│   ├── __init__.py             \u003c- Makes a Python module\n|   |           \n|   ├── prod                    \u003c- Folder with 'final' production scripts\n│   │   └── script_x            \u003c- folder containing one script / analysis module\n│   │      ├── output           \u003c- output folder for this module\n│   │      │    └── _rpr.zip    \u003c- reproducibility archive\n│   │      └── main.py          \u003c- script / analysis module\n│   │           \n│   ├── stage                   \u003c- Staging area.\n│   │           \n│   ├── toolbox                 \u003c- General use classes/functions/constants.\n│           \n├── data                        \u003c- Raw input data (small files only).\n│           \n├── editable                    \u003c- Folder with editable installed libraries.    \n│           \n├── hal                         \u003c- Folder with general use (global) scripts.\n│           \n├── metadata                    \u003c- Metadata for the project\n|           \n├── config.yaml                 \u003c- config settings available in hal.config.cfg object    \n├── freeze.txt                  \u003c- Output of pip freeze from the most recently ran script.    \n├── pyproject.toml              \u003c- makes project pip installable (pip install -e .) so ava can be imported\n\n\n```\n\n## Usage\n\nInstall `cookiecutter`, then run: \n\n    cookiecutter gh:jhsmit/cookiecutter-reproducible-analysis\n\nCD into the newly created directory, then create and activate your venv. \n\nInstall the project:\n\n    uv pip install -e .\n\n\nCheckout any libraries you want to use in editable mode, eg\n\n    git checkout https://github.com/Jhsmit/dont-fret.git editable/dont-fret\n\nInstall any editable library:\n\n    uv pip install -e editable/dont-fret\n\n\nCreate/copy a folder in the 'stage' directory, when you are happy with the script, move it to the 'prod' folder. \n\n## Reproducibility\n\nEach folder has a script (`main.py`) which generates some output in the corresponding `output` folder. To make scripts reproducible, use the following code snippet:\n\n```python\n\nfrom hal.repro import reproduce\n\npackages = [\"numpy\", \"dont_fret\", \"smitfit\"]\nOUTPUT_PATH = reproduce(globals(), packages=packages)\n\n```\n\nThe `reproduce` function will create a zip file in the `output` folder with the name `_rpr.zip`. This zip file contains the script, the current toolbox, and the versions of the packages used. The returned constant `OUTPUT_PATH` is the path to the output folder.\n\n\n## Output\n\nFor managing script output you can use the `Output` class. Consider the following example:\n\n```python\n\nimport ultraplot as uplt\nimport random\nimport polars as pl\nfrom hal.io import Output, save_fig, save_yaml\nfrom hal.config import cfg\nfrom hal.repro import reproduce\n\npackages = [\"numpy\", \"dont_fret\", \"smitfit\"]\nOUTPUT_PATH = reproduce(globals(), packages=packages)\nOVERWRITE = False  # set to True to overwrite existing files\n\n\ndef do_fit(data):\n    return {\"a\": random.random(), \"b\": random.random()}\n\n\ndef make_plot(data):\n    fig, ax = uplt.subplots(aspect=1.618)\n    ax.scatter(data[\"x\"], data[\"y\"])\n    return fig\n\n\ninput_files = cfg.paths[\"external_data\"]\n\nfor csv_file in input_files.glob(\"*.csv\"):\n    output = Output(\n        OUTPUT_PATH / csv_file.stem, overwrite=OVERWRITE, files=[\"fit.yaml\", \"plot.png\"]\n    )\n\n    if output.skip:\n        continue\n\n    data = pl.read_csv(csv_file)\n    fit = do_fit(data)\n    fig = make_plot(data)\n\n    save_yaml(fit, output[\"fit.yaml\"])\n    save_fig(fig, output[\"plot.png\"])\n    uplt.close(fig)\n\n    assert output.done\n\n```\n\nAside from creating a output folder with the reprodicibility .zip file, we are also using the `Output` class to keep track of the scripts' expected output. If the `OVERWRITE` flag is set to `True`, `output.skip` always returns false thus each file in the for loop is processed. Otherwise, `output.skip` return `True` only if both expected output files exists. This is very useful for a scenario where more data is added to the 'external_data' folder such that the script only processes new data. On the other hand, if the script is updated the overwrite flag can be set to `True` to reprocess all data. Finally, the `output.done` flag is set to `True` if all expected output files are created. This is useful for checking if the script has finished processing all data.\n\n### Credits\n\nThis cookiecutter is inspired by / derived from:\n\nhttps://github.com/drivendata/cookiecutter-data-science\nhttps://github.com/mkrapp/cookiecutter-reproducible-science\nhttps://github.com/timtroendle/cookiecutter-reproducible-research\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhsmit%2Fcookiecutter-reproducible-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhsmit%2Fcookiecutter-reproducible-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhsmit%2Fcookiecutter-reproducible-analysis/lists"}