{"id":13809727,"url":"https://github.com/dassencio/parallel-code-scanning","last_synced_at":"2025-05-14T08:33:35.451Z","repository":{"id":47001361,"uuid":"339013195","full_name":"dassencio/parallel-code-scanning","owner":"dassencio","description":"An example of a GitHub Actions workflow showing how code scanning with CodeQL can be parallelized on monorepos.","archived":false,"fork":false,"pushed_at":"2022-12-14T21:08:49.000Z","size":6316,"stargazers_count":11,"open_issues_count":2,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-19T02:39:03.437Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dassencio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-15T08:37:10.000Z","updated_at":"2023-09-06T19:51:25.000Z","dependencies_parsed_at":"2023-01-29T01:02:28.471Z","dependency_job_id":null,"html_url":"https://github.com/dassencio/parallel-code-scanning","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dassencio%2Fparallel-code-scanning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dassencio%2Fparallel-code-scanning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dassencio%2Fparallel-code-scanning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dassencio%2Fparallel-code-scanning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dassencio","download_url":"https://codeload.github.com/dassencio/parallel-code-scanning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254104968,"owners_count":22015571,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T02:00:35.052Z","updated_at":"2025-05-14T08:33:30.429Z","avatar_url":"https://github.com/dassencio.png","language":"JavaScript","readme":"# Parallel code scanning with CodeQL\n\nIf you have a large repository containing various independent projects (a\n\"monorepo\"), the time taken to scan your code with CodeQL can be significantly\nreduced by splitting the scanning work into various parallel jobs which will\nindividually analyze only a subset of the files in the repository.\n\nThis repository contains an example of a GitHub Actions\n[workflow](.github/workflows/code-scanning.yml) which does precisely\nthat. The strategy implemented here works however only for the interpreted\nlanguages supported by CodeQL (e.g. Python and JavaScript). As a first step in\nthis workflow, all non-hidden subdirectories under the repository's root\ndirectory are detected; parallel code scanning jobs are then dynamically\ngenerated for each subdirectory containing a software project. The\nsubdirectories [`project-1`](./project-1), [`project-2`](./project-2) and\n[`project-3`](./project-3) here represent three independent software projects\ninside the repository, each one of which will be scanned in a dedicated job\n(i.e., three jobs will be generated in total). Adding a new software project to\nthis repository (e.g. `project-4`) requires no changes to the workflow file as a\ndedicated code scanning job will be automatically generated for it when the\nworkflow is executed.\n\nThis strategy is possible because GitHub Actions workflows accept JSON input to\ndefine a job matrix, and  the JSON contents can be generated during the\nworkflow's execution. In other words, the job matrix can be defined dynamically.\n\n**NOTE**: The approach presented here must be taken with care as accidentally\nsplitting a software component in this manner may reduce CodeQL's ability to\nrecognize certain types of vulnerabilities in that component. For instance, it\nmay not be able to entirely map how data flows inside the component and\ntherefore miss possible attacks against it. Please make sure you understand the\ngeneral capabilities of CodeQL before doing this.\n\n## Answers to common questions\n\n**1.** _Even if files in only one subdirectory in the repository are changed,\ncode scanning jobs will be generated for all subdirectories containing software\nprojects, which is wasteful. Is it possible to limit the generation of jobs so\nthat only subdirectories with modified files will be scanned?_\n\nYes. The list of subdirectories which is used as input for the code scanning job\nmatrix is produced by a [script](./.github/scripts/list-dirs) which simply\noutputs all subdirectories under the repository's root directory. This script\ncan be modified in any way you want, so you can use [`git\ndiff`](https://stackoverflow.com/questions/50440420/git-diff-only-show-which-directories-changed)\nto build a list containing only subdirectories with modified files and use that\nlist as input for the job matrix generation.\n\n**2.** _Every code scanning job checks out the repository in parallel. If a\nchange is made to the repository during that time (e.g. a subdirectory is added\nor removed, or a file in a pre-existing subdirectory is modified), you\nessentially have a race condition which is not being properly handled._\n\nThis situation will not occur because the\n[`actions/checkout`](https://github.com/actions/checkout/) action only fetches a\nsingle commit by default, for the ref/SHA which triggered the workflow. This\nimplies that the same code snapshot will be checked out in all jobs triggered\ninside the [`Code scanning`](.github/workflows/code-scanning.yml) workflow.\n\nIf the amount of data fetched in each checkout step is large, you may get a\nperformance improvement by generating an artifact containing the code in the\nvery first job which is executed in the workflow and then consuming that\nartifact in all downstream jobs. The\n[`actions/upload-artifact`](https://github.com/actions/upload-artifact) and\n[`actions/download-artifact`](https://github.com/actions/download-artifact)\nactions will help you accomplish this.","funding_links":[],"categories":["CodeQL Monorepo Actions Samples"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdassencio%2Fparallel-code-scanning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdassencio%2Fparallel-code-scanning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdassencio%2Fparallel-code-scanning/lists"}