{"id":15046174,"url":"https://github.com/kamilrybacki/diff-cache","last_synced_at":"2026-01-25T20:49:51.784Z","repository":{"id":65437714,"uuid":"584371639","full_name":"kamilrybacki/diff-cache","owner":"kamilrybacki","description":"Finds files that have changed since last code SUCCESSFUL workflow","archived":false,"fork":false,"pushed_at":"2023-07-19T14:06:40.000Z","size":409,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-20T04:01:35.060Z","etag":null,"topics":["actions"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kamilrybacki.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-02T11:31:42.000Z","updated_at":"2024-01-13T17:43:24.000Z","dependencies_parsed_at":"2024-10-11T03:00:49.253Z","dependency_job_id":null,"html_url":"https://github.com/kamilrybacki/diff-cache","commit_stats":{"total_commits":25,"total_committers":3,"mean_commits":8.333333333333334,"dds":0.52,"last_synced_commit":"9b02d445a0c51242ca8768aa3bd2aa7384dfb613"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamilrybacki%2Fdiff-cache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamilrybacki%2Fdiff-cache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamilrybacki%2Fdiff-cache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamilrybacki%2Fdiff-cache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kamilrybacki","download_url":"https://codeload.github.com/kamilrybacki/diff-cache/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239885852,"owners_count":19713369,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions"],"created_at":"2024-09-24T20:52:48.644Z","updated_at":"2026-01-25T20:49:51.755Z","avatar_url":"https://github.com/kamilrybacki.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DiffCache\n\n## Problem statement\n\nSuppose that you have a large number of files, where their amount takes considerable amount of time\nto check or process in some predefined way. In most simple case, this can mean running a linter on a large number of files.\n\nWith the increasing number of files, the time it takes to process them increases as well. This is a problem, because\nit is not possible to process all files in a single run i.e. CI/CD Workflow run, before a next commit is submitted,\nby accident or on purpose, by the developer.\n\nWithin this next commit, previously unchecked files are not present in the `diff` between the current and the previous commit,\nso they are not processed.\n\nIn case of linting this means that the developer can submit a commit with a code that does not pass the linter checks,\nand the CI/CD pipeline will not be able to catch this.\n\nYou can also easily imagine a situation where a developer can accidentally bypass more serious checks,\nlike security checks, or even tests (if they are meant to fire when certain files are changed).\n\n## Solutions\n\n### Naive solution\n\nJust check all files within repository, every time. This is a naive solution, because it is not scalable.\n\n### Proposed solution\n\nThis solution is based on the idea of caching the results of the standard `git diff` command, and using it to\ndetermine which files should be checked. If there are no changes in the files, then there is no need to check them.\nIf there are previously cached files that are not present in the current `diff`, then they should be checked as well,\nsince they were not checked in the previous Workflow run, due to the human error described above (or other causes).\n\nResults of this check are stored within some sort of artifact or other storage,\npersistent between CI/CD Workflow runs. This information is used by the subsequent Workflow steps\nto check the necessary files. If all files are checked, then the cache is cleared.\n\n![Visual representation of Diff Cache Workflow](https://github.com/KamilRybacki/diff-cache/blob/media/use_case_diagram.png)\n\n## Usage\n\nThis Action can be used in the following way (as a step in the Workflow):\n\n```yaml\n- uses: KamilRybacki/diff-cache@v[version]\n    with:\n      # REQUIRED: Secret containing the cache. Doesn't have to be prepared beforehand, it will be created if it doesn't exist (see Note below).\n      cache_secret: ${{ secrets.CACHE_SECRET }}\n      # REQUIRED: Github token to use for the API calls. It is required to be able to create the cache secret and to be able to update it (see Note below).\n      token: ${{ secrets.TOKEN }}\n      # OPTIONAL: Set to true if You want to provide manually escaped regexps i.e. turn off manual escape of special characters (see Note below).\n      disable_escape: false  # That's the default value\n      # OPTIONAL: Regex to use to match the files to include in the cache\n      include: '.py'\n      # OPTIONAL: Regex to use to match the files to exclude from the cache check.\n      exclude: '.*/dont/check/this/.*'\n```\n\nAfter running this Action, the list of the files to check is available through the `files` output e.g.:\n\n```yaml\n- name: Get changed Python files\n  id: python-files-search\n  uses: KamilRybacki/diff-cache@v[version]\n    with:\n      include: '.py'\n      cache_secret: ${{ secrets.CACHE_SECRET }}\n      token: ${{ secrets.TOKEN }}\n- name: Some step that uses the result of the Diff Cache action\n  env:\n    FILES_TO_CHECK: ${{ python-files-search.outputs.files }}\n  run: mypy ${FILES_TO_CHECK} # Or whatever command really\n```\n\nThis output contains a whitespace delimtied list of files that were modified during current commit + files stored in the cache.\nIf no `include` or `exclude` regexps are provided, then all files registered by `git diff command` are stored in the cache.\n\n### Note\n\nThe `token` needs to have the necessary scopes for reading Workflow info and managing repo Secrets.\nThe most secure solution is to use a fine-grained token, with only the necessary scopes. Check the [Github documentation](https://docs.github.com/en/actions/reference/authentication-in-a-workflow#permissions-for-the-github_token) for more information about enabling the necessary scopes.\n\nThe `disable_escape` input is used to disable the automatic escaping of special characters in the provided regexps.\nBy default, the Action escapes the special characters in the regexps, so that the user does not have to worry about it\nand provide file extensions in a more readable way e.g. `.py` instead of `\\.*\\.py`.\n\nIf set to `true`, the user is responsible for escaping the special characters in the regexps, if they are present,\nsuch as aforementioned `.` or `*`. This is useful if the user wants to provide the regexps in a more readable way\nor do some other regular expressions magic with them. This is also useful if the user wants to provide the regexps\nthat target specific files or paths, not just file extensions.\n\nYou can use the [regex101](https://regex101.com/) website to test your regexps and then escape the special characters\nas seen in the official MDN documentation for [RegExp escaping](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping).\nNevertheless, use this feature with caution, as it is a purely no-handlebars zone.\n\n#### **IMPORTANT**\n\nThe secret MUST be created by the user **ONCE** (before using the Action for the 1st time) and then it is updated by the Action **automatically**.\nSadly, there is no way to create a secret from within the Workflow file **if its not present**, so it has to be created beforehand\n(at least from the side of my implementation, the user can do some magic with API calls and finding the secret etc.).\nIn other words, **CREATE AND FORGET IT** i.e. don't touch it.\nThis means that the secret should not be used for anything else. If there is anything present within it,\nthen the Action will **overwrite its contents**.\n\nThe name of the secret is obtained **by parsing** from the Github Workflow file that contains the triggered action.\nThis may sound confusing but the flow of this Action is as follows:\n\n1. The Action is triggered by the Workflow file.\n2. The Action uses the GitHub API to find the location of Workflow file within the latest commit tree.\n3. The Action reads the Workflow file i.e. parses its contents and tries to find the `cache_secret` input.\n4. Since this input is required, it is guaranteed to be present in the Workflow file, so the GitHub Secret name is read (using Regex magic).\n5. This name is used for subsequent API calls to create or update the secret.\n\nHence, **use one secret per Workflow** (if You plan to trigger this action multiple times in the same Workflow).\nIf this is not followed, the first occurrence of the `cache_secret` input present in the Workflow file will be used.\nIn future, an additional guard will be maybe added to halt the execution of the Action if multiple secrets are found.\n\n## How the staged files data is stored?\n\nThe cache is structured in the following format:\n\n```json\n{\n  \"[TAG CREATED BY THE ACTION]#1\": \"file1 file2 file3 ...\",\n  \"[TAG CREATED BY THE ACTION]#2\": \"file1 file2 file3 ...\",\n  \"[TAG CREATED BY THE ACTION]#3\": \"file1 file2 file3 ...\"\n}\n```\n\nwhere the `[TAG CREATED BY THE ACTION]` is the tag created by the Action using the combination of `include` and `exclude` inputs:\n\n```js\nconst tag = `${include}\u0026\u0026${exclude}`;\n```\n\nIf no `include` or `exclude` inputs are provided, then the tag is set to `all`:\n\n```js\nconst tag = 'all';\n```\n\nand the cache is structured in the following format:\n\n```json\n{\n  \"all\": \"file1 file2 file3 ...\",  // All files that were modified during the current + last unsuccessful commit\n  \"[TAG CREATED BY THE ACTION]#1\": \"file1 file2 file3 ...\",\n  \"[TAG CREATED BY THE ACTION]#2\": \"file1 file2 file3 ...\",\n  \"[TAG CREATED BY THE ACTION]#3\": \"file1 file2 file3 ...\"\n}\n```\n\nThis structure allows one secret per Workflow to be used with multiple `include` and `exclude` combinations (or no combinations at all i.e. tag `all`),\nwhich may correspond to multiple checks within the Workflow, based on the extensions, locations and so on of the modified files.\nAll can be done by the correct set up of the `include` and `exclude` regular expressions.\n\nBefore saving this data to the secret, it is stringified using `JSON.stringify`,\ncompressed by use od `lz-string` library (namely the methods from `LZString` interface)\nand then it is properly encrypted using the `libsodium-wrappers` library and repo public key.\n\nThis allows for the data to be stored in a small amount of space, and also to be encrypted (as needed by the Github Secrets API).\n\n## The cold hard truth\n\nThis Action is not perfect. The problems that it solves are not the most important ones,\nand the solutions that it provides may be not the best ones.\n\nOne of the motivations for me was to learn how to write a Github Action and navigate the Github API.\nI also wanted to learn how to use the `libsodium-wrappers` library, which I used for the encryption of the data.\nThere may be some cool lessons to learn by studying the `DiffCache` and `ActiveWorkflowFileReader` classes.\n\nIf You have ideas for improvements, feel free to open an issue or a PR. I will be happy to discuss it with You.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkamilrybacki%2Fdiff-cache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkamilrybacki%2Fdiff-cache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkamilrybacki%2Fdiff-cache/lists"}