{"id":23194182,"url":"https://github.com/anh0ang/kedro-cache","last_synced_at":"2025-08-18T20:34:07.391Z","repository":{"id":61940738,"uuid":"556415484","full_name":"AnH0ang/kedro-cache","owner":"AnH0ang","description":"A kedro-plugin that adds caching to kedro pipelines","archived":false,"fork":false,"pushed_at":"2022-10-23T20:39:33.000Z","size":89,"stargazers_count":6,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-13T23:18:22.867Z","etag":null,"topics":["caching","kedro","kedro-plugin"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AnH0ang.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-23T19:55:49.000Z","updated_at":"2024-07-15T07:39:54.000Z","dependencies_parsed_at":"2022-10-23T21:45:18.210Z","dependency_job_id":null,"html_url":"https://github.com/AnH0ang/kedro-cache","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnH0ang%2Fkedro-cache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnH0ang%2Fkedro-cache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnH0ang%2Fkedro-cache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnH0ang%2Fkedro-cache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AnH0ang","download_url":"https://codeload.github.com/AnH0ang/kedro-cache/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230276309,"owners_count":18201092,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["caching","kedro","kedro-plugin"],"created_at":"2024-12-18T13:13:22.843Z","updated_at":"2024-12-18T13:13:23.320Z","avatar_url":"https://github.com/AnH0ang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kedro Cache\n\n\u003e :warning: _This plugin is still under active developement and not fully tested. Do not use this in any production systems. Please report any issues that you find._\n\n## 📝 Description\n\n`kedro-cache` is a [kedro](https://kedro.org/) plugin that plugin that enables the caching of data sets.\nThe advantage is that the data sets are loaded from data catalog and not recomputed if they have not changed.\nIf the input data sets or code have changed, the outputs are recomputed and the data catalog is updated.\nThis plugin works out of the box with any kedro project without having to change the code.\nThe logic on how to determine if the cached data set in the catalog should be used is described in the flow chart below.\n\n![Caching Flowchart](static/img/caching_flowchart.svg)\n\n**Disclaimer:** _The caching strategy determines if a node function has changes by simply looking at the immediate function body.\nThis does not take into account other things such as called function, global variable etc. that might also have changed._\n\n## 🏆 Features\n\n- Caching of node outputs in catalog\n- No change to kedro project needed\n- Integration with kedro data catalog\n- Configuration via `config.yml` file\n\n## 🏗 Installation\n\nThe plugin can be install with `pip`\n\n```bash\npip install kedro-cache\n```\n\n## 🚀 Enable Caching\n\nIn the root directory of your kedro project, run\n\n```bash\nkedro cache init\n```\n\nThis will create a new file `cache.yml` in the `conf` directory of your kedro project in which you can configure the `kedro-cache` module.\nAlthough this step is optional as the plugin comes with default configurations.\n\nNext let's assume that you have the following kedro pipeline for which you want to add caching.\nThere are two nodes.\nOne that reads data from a `input` dataset, does some computations and writes it to a `intermediate` dataset and one that reads the data from the `intermediate` dataset and writes it to the `output` dataset.\n\n```python\n# pipeline.py\n\ndef register_pipelines() -\u003e Dict[str, Pipeline]:\n    default_pipeline = pipeline(\n        [\n            node(\n                func=lambda x: x,\n                inputs=\"input\",\n                outputs=\"intermediate\",\n            ),\n            node(\n                func=lambda x: x,\n                inputs=\"intermediate\",\n                outputs=\"output\",\n            ),\n        ],\n    )\n    return {\"__default__\": default_pipeline}\n```\n\nIn order to add logging we simply just have to register all used data sets in the data catalog.\nBecause if the first node want to use the cached output instead of recalculating it, it need to load it from the data catalog.\nThis is only possible if it was stored there.\n\n```yaml\n# catalog.yml\n\ninput:\n  type: pandas.CSVDataSet\n  filepath: input.csv\n\nintermediate:\n  type: pandas.CSVDataSet\n  filepath: intermediate.csv\n\noutput:\n  type: pandas.CSVDataSet\n  filepath: output.csv\n```\n\nAnd that was it. Just by adding all files to the catalog you enabled caching.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanh0ang%2Fkedro-cache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanh0ang%2Fkedro-cache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanh0ang%2Fkedro-cache/lists"}