{"id":20801954,"url":"https://github.com/philips-software/license-scanner","last_synced_at":"2025-05-07T00:45:49.531Z","repository":{"id":36984033,"uuid":"278601243","full_name":"philips-software/license-scanner","owner":"philips-software","description":"Service to scan licenses from source code","archived":false,"fork":false,"pushed_at":"2023-08-14T02:02:51.000Z","size":14344,"stargazers_count":12,"open_issues_count":10,"forks_count":2,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-03-31T04:41:11.898Z","etag":null,"topics":["license-scanning-framework","sbom","software-bill-of-materials"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philips-software.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null}},"created_at":"2020-07-10T10:04:37.000Z","updated_at":"2024-01-21T13:26:19.000Z","dependencies_parsed_at":"2023-02-19T03:46:16.274Z","dependency_job_id":null,"html_url":"https://github.com/philips-software/license-scanner","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philips-software%2Flicense-scanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philips-software%2Flicense-scanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philips-software%2Flicense-scanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philips-software%2Flicense-scanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philips-software","download_url":"https://codeload.github.com/philips-software/license-scanner/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252793564,"owners_count":21805054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["license-scanning-framework","sbom","software-bill-of-materials"],"created_at":"2024-11-17T18:26:43.005Z","updated_at":"2025-05-07T00:45:49.524Z","avatar_url":"https://github.com/philips-software.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# License Scanner service\n\nBackend service to scan licenses from the source code of (open source) packages.\n\n**Status**: _Experimental research prototype_\n\n\u003e Powered by Philips SWAT Eindhoven\n\n(See the [architecture document](docs/architecture.md) in the `docs` directory.)\n\nTypical usage is the integration with CI/CD build pipeline tools (like\n[SPDX-Builder](https://github.com/philips-software/spdx-builder)) to obtain\nvalidated license information. Prior scan results are provided if the package\nhas been scanned before. Packages that were not yet scanned are automatically\nscheduled for download and scanning.\nA [web user interface](https://github.com/philips-software/license-scanner-ui)\nis provisioned by the the service for monitoring the scanning process and\nmanually curate scan results.\n\nClients interact with the service via a REST API to provide the license for the\nspecified package. If the package was scanned before, the license is returned\nimmediately. Else a scan of the package is scheduled if a source code location\nwas provided, so a future request for the package can be answered. When a client\ndetects a mismatch between the license declared by (e.g.) a package manager and\nthe license detected by the scanner, it can \"contest\" the license. This marks\nthe scanned license for human curation. After manual inspection, the (corrected)\nlicense is \"confirmed\" to indicate the next requesting client that the provided\nlicense is reliable. If the scan failed due to an incorrect source code location\nor other technical issue, the user can manually correct the location and restart\nthe scan.\n\nThe [ScanCode Toolkit version 3.x](https://github.com/nexB/scancode-toolkit)\ncommand line tool performs the actual source code scan. The service schedules\ndownload of package source code in the background, invokes the scanner, and\nmakes detected license information available the next time it is requested by a\nclient via the REST API.\n\nScanCode Toolkit reports detected licenses per source file, which are joined by\nthe service into a single package-level license using the logical \"AND\"\noperator. The API reports licenses as-is, without checking for validity or\ncompatibility. In case of dual licensing, it is left to the client to choose the\nappropriate license.\n\nThe service persists per detected license:\n\n- The total number of detections in the code\n- A sample file with a (largest) range of lines that indicated the license\n- The license itself, specified in (where\n  possible) [SDPX identifiers](https://spdx.org/licenses)\n\nManual curation allows for marking individual detections as false-positives, and\nadjusting the confirmed license accordingly. In the user interface the button\nnext to the source location opens a web URL that is derived from the VCS URI (\nsee below) to manually browse the referenced source archive or download the\nsource code archive.\n\nPackages are specified by\ntheir [package URL](https://github.com/package-url/purl-spec)\nto ensure unique identification across package managers.\n\nThe location of source code is specified using a VCS location URI according to\nthe format defined in\nthe [SPDX specification](https://spdx.github.io/spdx-spec/3-package-information/#37-package-download-location):\n\n```\n\u003cvcs_tool\u003e+\u003ctransport\u003e://\u003chost_name\u003e[/\u003cpath_to_repository\u003e][@\u003crevision_tag_or_branch\u003e][#\u003csub_path\u003e]\n```\n\nwhere all fields are URL-encoded to escape reserved characters (like \"@\" to \"\n%40\").\n\nCurrent supported sources for downloading source code from:\n\n- Plain web (and file) URL download\n- Installed command-line Git client (version 2.24 or higher)\n\nIn case of a plain download, the downloaded archive is automatically extracted\nbefore starting the scan.\n\nThe Git download assumes the default branch if no explicit version is provided.\nElse it attempts to check out the source code in the following ways:\n\n1. Branch/tag checkout using the literal version\n2. Branch/tag checkout prepending \"v\" to the literal version\n3. Revision checkout using the literal version as commit hash\n\n## Dependencies\n\nThe service requires the Java 11 (or later) runtime environment.\n\nScan results are persisted to disk in a local H2 database. The H2 database\ndriver is part of the application, so no external dependencies are\n(currently) required for persistent storage.\n\n## Installation\n\nThe application is built from source code using the standard Gradle build\ncommand:\n\n```\ngradlew build\n```\n\nThe `build/distributions` directory contains archives for distribution of the\nJava application, including startup scripts.\n\n## Configuration\n\n### ScanCode Toolkit installation\n\nScanCode Toolkit must be invoked on Linux and OSX using an absolute installation\npath (\nsee [the ScanCode Toolkit documentation](https://scancode-toolkit.readthedocs.io/en/latest/cli-reference/synopsis.html))\n. When not installed using pip or running on Windows, make sure `extractcode`\nand\n`scancode` can be accessed through a script, without providing the installation\npath.\n\n### Working directory for temporary files\n\nPackage source code gets downloaded to a temporary directory for scanning. The\nbase directory is the `TMPDIR` directory, and can be changed by setting\nthe `LICENSE_DIR` environment variable.\n\n### License detection threshold\n\nThe heuristic processes detecting licenses from source code use a default\ncertainty threshold of 50 (percent) to accept a detected license. This threshold\ncan be overridden using the `LICENSE_THRESHOLD` environment variable to set a\nvalue between 0 and 100.\n\n## Usage\n\nThe service can be started from the command line using the startup scripts in\nthe\n`bin` directory of the distribution archive.\n\nAfter starting up, the service exposes on port 8080:\n\n* An API to interact with the scanning service\n* A user interface on [localhost:8080/](http://localhost:80080) to monitor\n  license scanning errors and manually curate scanned licenses. (See the\n  separate\n  [license-scanner-ui](https://github.com/philips-software/license-scanner-ui)\n  user interface project.)\n* A simple database management tool\n  on [localhost:8080/h2](http://localhost:8080/h2)\n  with credentials \"user\" and \"password\".\n\nIf migration of the database fails, a stand-alone can be started from the\ncommand line on Linux or Mac using:\n\n    java -jar ~/.m2/repository/com/h2database/h2/\u003cversion\u003e/h2-\u003cversion\u003e.jar\n\n(Failed migrations can be manually fixed or removed in the \"\nflyway_schema_history\"\ntable.)\n\n### Docker\n\nAfter building the project, you can also run the application with Docker.\n\nUse docker-compose:\n\n```bash\ndocker-compose up -d\n```\n\nUse image stored on docker hub :\n\n```bash\ndocker run -p 8080:8080 philipssoftware/license-scanner:latest\n```\n\nBuild docker image:\n\n```bash\n  docker build -f docker/Dockerfile -t license-scanner .\n```\n\nRun application:\n\n```bash\ndocker run -p 8080:8080 license-scanner \n```\n\n## How to test the software\n\nThe unit test suite can be executed via Gradle:\n\n```\ngradlew test\n```\n\n## Known issues\n\n(Checked items are under development.)\n\nMust-have\n\n- [ ] Authentication of client edits to prevent unauthorized curations.\n\nShould-have\n\n- [ ] Make number of processes configurable to improve performance on (virtual)\n  machines with fewer cores.\n- [ ] Download by commit hash instead of git clone, because this is much faster.\n- [ ] Detect and return copyright statements.\n- [ ] Production-grade database (e.g. Postgres).\n\n## Contact / Getting help\n\nUse the issue tracker of this project.\n\n## License\n\nSee [LICENSE.md](LICENSE.md).\n\n## Credits and references\n\nThis service could not be made without the ScanCode Toolkit project. See the\n[documentation of ScanCode Toolkit](https://readthedocs.org/projects/scancode-toolkit)\nfor details on its invocation and how it detects licenses in source code files.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilips-software%2Flicense-scanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilips-software%2Flicense-scanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilips-software%2Flicense-scanner/lists"}