{"id":15899584,"url":"https://github.com/cpbridge/pycrumbs","last_synced_at":"2025-08-01T07:31:27.318Z","repository":{"id":65661135,"uuid":"590152595","full_name":"CPBridge/pycrumbs","owner":"CPBridge","description":"Library for tracking execution of Python functions","archived":false,"fork":false,"pushed_at":"2024-08-31T14:07:08.000Z","size":119,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-21T06:23:32.229Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CPBridge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-17T19:13:17.000Z","updated_at":"2024-08-31T14:04:59.000Z","dependencies_parsed_at":"2024-10-28T05:55:25.364Z","dependency_job_id":"1b5c5021-5a77-497a-bba3-8813a6328630","html_url":"https://github.com/CPBridge/pycrumbs","commit_stats":{"total_commits":28,"total_committers":3,"mean_commits":9.333333333333334,"dds":0.25,"last_synced_commit":"20bc3aa644d9cc6a33dbb178dec3239faee93c92"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/CPBridge/pycrumbs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CPBridge%2Fpycrumbs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CPBridge%2Fpycrumbs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CPBridge%2Fpycrumbs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CPBridge%2Fpycrumbs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CPBridge","download_url":"https://codeload.github.com/CPBridge/pycrumbs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CPBridge%2Fpycrumbs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268185559,"owners_count":24209392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T10:21:55.059Z","updated_at":"2025-08-01T07:31:27.057Z","avatar_url":"https://github.com/CPBridge.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `pycrumbs`\n\nThis is a Python package to create \"record files\" of whenever a function is\nexecuted. This file is placed alongside the output files of your function and\nallows you to \"follow the breadcrumbs\" to figure out how exactly a set of files\nwere created. This includes information on the parameters of the functions, the\nplatform and environment it was run in, the versions of all libraries\ninstalled, and source control information about the current state of the\nproject (using git).\n\n`pycrumbs` was developed for reproducible machine learning pipelines, but can\nbe used for any application where Python code creates output files that may\nneed to be reproduced at a later stage.\n\n### Installation\n\nYou can install the latest release of the package from PyPI:\n\n```\npip install pycrumbs\n```\n\nOr you can install the package directly from the repo using pip:\n\n```\npip install git+https://github.com/CPBridge/pycrumbs\n```\n\n### Use\n\nThe sole purpose of this library is to create \"record\" files for Python\nfunctions that execute. The intended use of this is to \"track\" calls to key\nfunctions (such as machine learning training or preprocessing routines) to\nensure that they can be reproduced later. The type of information captured\nincludes:\n\n- Information about the function that was called. Its name, source file,\n  and arguments it was called with at runtime.\n- Environment information. Username that ran the code, the\n  node and operating system on which it was run, GPUs available on the\n  machine. SLURM specific information if run within a SLURM environment.\n- Timing information (start and end times and duration).\n- Command line arguments used to invoke the Python interpreter.\n- Version control information about the function being executed, including\n  the git hash and current active branch. Note that this requires that the\n  function was executed from a source file within a git repository. If you\n  want to install your package code with pip, use the\n  `pip install --editable` option to allow for tracking with this\n  decorator.\n- A list of all Python packages currently installed, with their versions.\n- Seeding information for random number generators.\n\nYou can very easily add a job record to a function using the `tracked`\ndecorator in the `pycrumbs` module. It is assumed\nthat a tracked function will create output files within a given output\ndirectory, and that the job record file (a JSON file) should be placed in\nthe same directory. `tracked` gives you various options on how to\nspecify where the output location is, including intercepting runtime\nparameters of the decorated function:\n\nSpecify a literal output location (always the same).\n\n```python\nfrom pathlib import Path\nfrom pycrumbs import tracked\n\n\n@tracked(literal_directory=Path('/home/user/proj/'))\ndef my_train_fun():\n    # Do something...\n    pass\n\n# Record will be placed at /home/user/proj/my_train_fun_record.json\nmy_train_fun()\n```\n\nSpecify an output location by intercepting the runtime value of a parameter\nof the decorated function.\n\n```python\nfrom pathlib import Path\nfrom pycrumbs import tracked\n\n\n@tracked(directory_parameter='model_output_dir'))\ndef my_train_fun(model_output_dir: Path):\n    # Do something...\n    pass\n\n# Record will be placed at\n# /home/user/proj/my_model/my_train_fun_record.json\nmy_train_fun(model_output_dir=Path('/home/user/proj/my_model'))\n```\n\nSpecify an output location that is determined dynamically by intercepting\nthe runtime value of a parameter of the decorated function to use as the\nsub-directory of a literal location. This is useful when only the final\npart of the path changes between different times the function is run.\n\n```python\nfrom pathlib import Path\nfrom pycrumbs import tracked\n\n\n@tracked(\n    literal_directory=Path('/home/user/proj/'),\n    subdirectory_name_parameter='model_name'\n)\ndef my_train_fun(model_name: str):\n    # Do something...\n    pass\n\n# Record will be placed at\n# /home/user/proj/my_model/my_train_fun_record.json\nmy_train_fun(model_name='my_model')\n```\n\nYou may also have the decorator dynamically add a timestamp\n(`include_timestamp`) or a UUID (`include_uuid`) to the output directory to\nensure that each run results in a unique output directory. If you do this\nin combination with `output_dir_parameter` or `subdirectory_name_parameter`, the\nvalue of the relevant parameter will be updated to reflect the addition\nof the UUID/timestamp.\n\n```python\nfrom pathlib import Path\nfrom pycrumbs import tracked\n\n@tracked(\n    literal_directory=Path('/home/user/proj/'),\n    subdirectory_name_parameter='model_name',\n    include_uuid=True,\n)\ndef my_train_fun(model_name: str):\n    print(model_name)\n    # Do something...\n    pass\n\n# Record will be placed at, e.g.\n# /home/user/proj/my_model_2dddbaa6-620f-4aaa-9883-eb3557dbbdb2/my_train_fun_record.json\nmy_train_fun(model_name='my_model')\n# prints my_model_2dddbaa6-620f-4aaa-9883-eb3557dbbdb2\n```\n\nAlternatively, you may specify an alternative parameter of the wrapped\nfunction into which the full updated output directory path will be placed.\nUse the `directory_injection_parameter` to specify this. This is required when\nyou append a timestamp or a UUID without specifying an `output_dir_parameter`\nor `subdirectory_name_parameter`, otherwise there is no way for the wrapped\nfunction to access the updated output directory. However, you may use\nthis at any time for your convenience.\n\n```python\nfrom pathlib import Path\nfrom pycrumbs import tracked\n\n@tracked(\n    literal_directory=Path('/home/user/proj/'),\n    include_uuid=True,\n    directory_injection_parameter='model_directory'\n)\ndef my_train_fun(\n    model_directory: Path | None\n):\n    print(model_directory)\n    # Do something...\n    pass\n\n# Record will be placed at, e.g.\n# /home/user/proj_2dddbaa6-620f-4aaa-9883-eb3557dbbdb2/my_train_fun_record.json\nmy_train_fun()\n# prints /home/user/proj_2dddbaa6-620f-4aaa-9883-eb3557dbbdb2\n```\n\n#### Git Versioning\n\nUnless the `disable_git_tracking` parameter is set, `pycrumbs` will try to\ninclude the git versioning information of the source code where the function\nis defined. This implies two requirements:\n\n1. The function is defined in a file. If you try to track a function defined\n   interactively in a REPL, it won't work!\n2. The source code containing the tracked function is in a git repository on\n   your filesystem. If you have a repository tracked using git, but then `pip\n   install` the package to run it, since the source file actually being executed\n   is the installed copy of the one under version control. In this siutation,\n   you should install using the `-e` flag, such that the executed version of\n   the file is the same as the one under version control.\n\n#### Seeds\n\nIn addition to tracking your function call, `pycrumbs` also handles seeding\nrandom number generators for you and storing the seed in the record file\nso that you can reproduce the run later, if needed. `pycrumbs` knows how to\nseed random number generators in the following libraries and will do so if\nthey are installed:\n\n- The Python standard library `random` module.\n- Numpy\n- Tensorflow\n- Pytorch\n\nAdditionally you may specify a seed parameter by intercepting the runtime value\nof a parameter of the decorated function. This is always recommended as it\nallows re-running the job with the same seed without having to change the code.\n\n```python\nfrom pathlib import Path\nfrom pycrumbs import tracked\n\n\n@tracked(\n    literal_directory=Path('/home/user/proj/'),\n    subdirectory_name_parameter='model_name',\n    seed_parameter='seed'\n)\ndef my_train_fun(model_name: str, seed: int | None = None):\n    # Do something...\n    print(seed)\n\n# Seed will be injected at runtime by the decorator mechanism and stored in the\n# record file, you don't have to do anything else\nmy_train_fun(model_name='my_model')\n# prints e.g. 272428\n\n# But you can manually specify the seed later to reproduce, without\n# having to alter the function signature\nmy_train_fun(model_name='my_model', seed=272428)\n# prints 272428\n```\n\n#### Combining with Other Decorators\n\nIf you use this decorator in combination with decorators such as those from the\n`click` module (and most other common decorators), you should place this\ndecorator last. This is because the `click` decorators alter the function\nsignature, which will break the operation of `tracked`. The last decorator is\napplied first, meaning that it will operate on the function with its original\nsignature as intended.\n\n```python\nfrom pathlib import Path\nimport click\nfrom pycrumbs import tracked\n\n\n@click.command()\n@click.argument('model_name')\n@click.option('--seed', '-s', type=int, help='Random seed.')\n@tracked(\n    literal_directory=Path('/home/user/proj/'),\n    subdirectory_name_parameter='model_name',\n    seed_parameter='seed'\n)\ndef my_train_fun(model_name: str, seed: int | None = None):\n    # Do something...\n    print(seed)\n```\n\n### Record Files\n\nHere is an example of a job record file created by running a simple function\nlike the ones in the example above:\n\n```json\n{\n    \"uuid\": \"51af8c51-0de5-48f5-a7da-9ec9688d56b6\",\n    \"timing\": {\n        \"start_time\": \"2023-01-16 22:38:54.484040\",\n        \"end_time\": \"2023-01-16 22:38:54.534795\",\n        \"run_time\": \"0:00:00.050755\"\n    },\n    \"environment\": {\n        \"argv\": [\n            \"thing.py\"\n        ],\n        \"orig_argv\": [\n            \"/Users/chris/.pyenv/versions/pycrumbs/bin/python\",\n            \"thing.py\"\n        ],\n        \"platform\": \"darwin\",\n        \"platform_info\": \"macOS-12.3.1-arm64-arm-64bit\",\n        \"python_version\": \"3.10.3 (main, Sep 26 2022, 17:17:59) [Clang 13.1.6 (clang-1316.0.21.2.5)]\",\n        \"python_implementation\": \"cpython\",\n        \"python_executable\": \"/Users/chris/.pyenv/versions/pycrumbs/bin/python\",\n        \"cwd\": \"/Users/chris/Developer/project\",\n        \"hostname\": \"hostname.example.com\",\n        \"python_path\": [\n            \"/Users/chris/Developer/project\",\n            \"/Users/chris/.pyenv/versions/project/lib/python3.10/site-packages/git/ext/gitdb\",\n            \"/Users/chris/.pyenv/versions/3.10.3/lib/python310.zip\",\n            \"/Users/chris/.pyenv/versions/3.10.3/lib/python3.10\",\n            \"/Users/chris/.pyenv/versions/3.10.3/lib/python3.10/lib-dynload\",\n            \"/Users/chris/.pyenv/versions/project/lib/python3.10/site-packages\",\n            \"/Users/chris/Developer/project/src\"\n        ],\n        \"cpu_count\": 10,\n        \"user\": \"cpb28\",\n        \"environment_variables\": {\n            \"CUDA_VISIBLE_DEVICES\": null,\n            \"VIRTUAL_ENV\": \"/Users/chris/.pyenv/versions/3.10.3/envs/pycrumbs\",\n            \"PYENV_VIRTUAL_ENV\": \"/Users/chris/.pyenv/versions/3.10.3/envs/pycrumbs\",\n            \"PYTHONPATH\": null\n        }\n    },\n    \"package_inventory\": {\n        \"setuptools\": \"65.6.3\",\n        \"types-setuptools\": \"57.4.0\",\n        \"attrs\": \"22.1.0\",\n        \"pip\": \"22.0.4\",\n        \"packaging\": \"22.0\",\n        \"ipython\": \"7.34.0\",\n        \"pytest\": \"7.2.0\",\n        \"traitlets\": \"5.8.0\",\n        \"decorator\": \"5.1.1\",\n        \"smmap\": \"5.0.0\",\n        \"pexpect\": \"4.8.0\",\n        \"typing-extensions\": \"4.4.0\",\n        \"gitdb\": \"4.0.10\",\n        \"flake8\": \"3.7.7\",\n        \"gitpython\": \"3.1.29\",\n        \"prompt-toolkit\": \"3.0.36\",\n        \"pydocstyle\": \"3.0.0\",\n        \"pygments\": \"2.13.0\",\n        \"pycodestyle\": \"2.5.0\",\n        \"snowballstemmer\": \"2.2.0\",\n        \"pyflakes\": \"2.1.1\",\n        \"tomli\": \"2.0.1\",\n        \"six\": \"1.16.0\",\n        \"py\": \"1.9.0\",\n        \"flake8-docstrings\": \"1.3.0\",\n        \"iniconfig\": \"1.1.1\",\n        \"exceptiongroup\": \"1.0.4\",\n        \"flake8-polyfill\": \"1.0.2\",\n        \"pluggy\": \"1.0.0\",\n        \"mypy\": \"0.910\",\n        \"jedi\": \"0.18.2\",\n        \"toml\": \"0.10.2\",\n        \"parso\": \"0.8.3\",\n        \"pep8-naming\": \"0.8.2\",\n        \"pickleshare\": \"0.7.5\",\n        \"ptyprocess\": \"0.7.0\",\n        \"mccabe\": \"0.6.1\",\n        \"mypy-extensions\": \"0.4.3\",\n        \"entrypoints\": \"0.3\",\n        \"wcwidth\": \"0.2.5\",\n        \"backcall\": \"0.2.0\",\n        \"matplotlib-inline\": \"0.1.6\",\n        \"appnope\": \"0.1.3\",\n        \"pycrumbs\": \"0.1.0\"\n    },\n    \"called_function\": {\n        \"name\": \"my_train_fun\",\n        \"module\": \"__main__\",\n        \"source_file\": \"/Users/chris/Developer/project/train.py\",\n        \"parameters\": {\n            \"model_name\": \"my_model\",\n            \"seed\": null\n        },\n        \"altered_parameters\": {\n            \"model_name\": \"my_model\",\n            \"seed\": 106543\n        }\n    },\n    \"tracked_module\": {\n        \"module_path\": \"/Users/chris/Developer/project/train.py\",\n        \"name\": \"__main__\",\n        \"git_active_branch\": \"master\",\n        \"git_commit_hash\": \"2a38ea37ee41466612f312d082445528df9f8d9d\",\n        \"git_is_dirty\": true,\n        \"git_remotes\": {\n            \"origin\": \"ssh://git@github.com:/example/project.git\"\n        },\n        \"git_working_dir\": \"/Users/chris/Developer/project\"\n    },\n    \"seed\": 106543\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcpbridge%2Fpycrumbs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcpbridge%2Fpycrumbs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcpbridge%2Fpycrumbs/lists"}