{"id":31772038,"url":"https://github.com/aboutcode-org/purl-benchmarks","last_synced_at":"2026-02-16T03:34:02.841Z","repository":{"id":312899891,"uuid":"1049210608","full_name":"aboutcode-org/purl-benchmarks","owner":"aboutcode-org","description":"AboutCode PURL Accuracy Benchmarks","archived":false,"fork":false,"pushed_at":"2025-09-03T02:45:52.000Z","size":218,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-21T22:33:14.158Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aboutcode-org.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.rst","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"aboutcode-org","open_collective":"aboutcode","custom":["https://causes.benevity.org/causes/056-5528680976845_a486","Benevity"]}},"created_at":"2025-09-02T16:37:09.000Z","updated_at":"2025-09-03T16:13:26.000Z","dependencies_parsed_at":"2025-09-02T18:32:57.900Z","dependency_job_id":"c147888d-222c-4a70-8b46-233fc185ff3c","html_url":"https://github.com/aboutcode-org/purl-benchmarks","commit_stats":null,"previous_names":["aboutcode-org/purl-benchmarks"],"tags_count":0,"template":false,"template_full_name":"aboutcode-org/skeleton","purl":"pkg:github/aboutcode-org/purl-benchmarks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aboutcode-org%2Fpurl-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aboutcode-org%2Fpurl-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aboutcode-org%2Fpurl-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aboutcode-org%2Fpurl-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aboutcode-org","download_url":"https://codeload.github.com/aboutcode-org/purl-benchmarks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aboutcode-org%2Fpurl-benchmarks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29499615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-16T02:07:14.481Z","status":"online","status_checked_at":"2026-02-16T02:03:22.852Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-10T03:55:11.183Z","updated_at":"2026-02-16T03:34:02.826Z","avatar_url":"https://github.com/aboutcode-org.png","language":null,"readme":"# AboutCode PURL Accuracy Benchmark Guide\n\n## Problem\n\nPublic studies of SBOMs have demonstrated that the biggest hurdle to SBOM\neffectiveness, when shared among producers and consumers, is the quality of the\nsoftware identifiers, especially the need for those identifiers to be precise\nand machine-processable. The PURL standard supports precise, accurate, high-\nquality software identification.\n\n## Objective\n\nProvide guidance to improving the accuracy of PURLs (Package URLs) constructed\nby SCA tools by making use of a shared, open, public, and agreed-upon baseline\nof the list of PURLs that must be found when scanning a given test input\ncodebase.\n\n## Resources\n\nAboutCode PURL benchmarks are reference lists of expected PURLs, for a variety\nof public projects that can be used to demonstrate and validate the detection of\ncorrect PURL values. Planned tested input projects include PURL lists derived\nfrom:\n\n* A base RedHat container image (UBI)  \n* A base Debian or Ubuntu container image  \n* A large Go-based web application (with k8s)  \n* A large Python-based web application  \n* A large Java/JS-based application\n\nThe AboutCode PURL benchmark files are in a simple .txt format. They are a\nsorted list of expected PURL values to be found when analyzing an input\ncodebase.\n\nThe AboutCode PURL benchmark files were originally generated using standard\n[ScanCode.io](http://ScanCode.io) pipelines, and other open source tools, and\nwere carefully reviewed for correctness by expert analysts. They form a\ncommunity digital commons asset and we expect careful scrutiny and community\ncontributions to ensure these represent the ground source of truth that should\nbe reported by any tool.\n\nSee table in the section \"PURL Benchmark Tested Inputs and Expected Outputs\" for\ndetailed PURL benchmark data.\n\n## PURL Benchmark Accuracy Procedure\n\n* Download (1) a software project that provided the input basis for a PURL\n  benchmark. We call this the tested input codebase.\n\n* Download (2) the corresponding benchmark file with a list of expected PURLs\n  that have been curated for this tested input\n* Use your selected SCA tool to analyze the downloaded software codebase.  \n  * If possible, configure your SCA tool to generate an output file that is a simple text list of PURL values; or,\n  * Generate an SBOM with your SCA tool; and,  \n    * Either extract a simple list of PURL values for a manual comparison with the expected PURLs; or\n    * Use the SBOM in SPDX or CycloneDX JSON format for use with the \"benchmark\\_purls\" ScanCode.io pipeline\n* Compare your generated output with the PURL benchmark. This comparison can be done:  \n  * Manually reviewing the two sorted lists of PURLs,  \n  * Using a diff utility,  \n  * Using the \"benchmark\\_purls\" ScanCode.io pipeline that takes as input a tested SBOM and an expected PURLs list.\n\n* When reviewing the comparison results, pay special attention to:   \n  * The percentage of exact PURL matches between the tested and expected PURL \n  * Mis-matches on each of the PURL components for partially matched PURLs\n    (https://github.com/package-url/purl-spec/blob/main/purl-specification.md).\n  * Missing PURLs: PURLs that are in the benchmark but not in your output.  \n  * Extraneous PURLs: PURLs that are in your output but not in the benchmark.  \n  * Consider these matching patterns for each of your PURL components:  \n* **scheme**: should always have a constant value of \"pkg\". Required.  \n* **type**: the package \"type\" or package \"protocol\" such as maven, npm, nuget,\n  gem, pypi, etc. Required.  \n* **namespace**: some name prefix such as a Maven groupid, a Docker image owner,\n  a GitHub user or organization. Optional and type-specific.  \n* **name**: the name of the package. Required.  \n* **version**: the version of the package. Optional, but strongly recommended.  \n* **qualifiers**: extra qualifying data for a package such as an OS,\n  architecture, a distro, etc. Optional and type-specific.  \n* **subpath**: extra subpath within a package, relative to the package root. Optional.  \n* Take action on each mis-match to improve your tool capabilities:  \n  * Visit the purl-spec to get detailed guidance about the problem area\n    https://github.com/package-url/purl-spec/blob/main/purl-specification.md , or  \n  * If you think the AboutCode PURL benchmark value is wrong, or could be\n    improved, please log an issue describing exactly what you found so we can\n    amend the expected PURLs results for a tested input.\n\n## PURL Benchmark Tested Inputs and Expected Outputs\n\nThis table contains the download URLs  for tested input codebass and the\ncorresponding filename of the expected PURLs list.\n\nThe expected PURLs files are stored in the `expected-PURLs` directory.\n\n| Input name | Input version | Download URL | Expected PURLs filename | Notes |\n| :---- | :---- | :---- | :---- | :---- |\n| alpine | 3.22.1 | docker://alpine:3.22.1 | alpine-3.22.1-purls.txt | An Alpine container image |\n| debian | trixie | docker://debian:trixie | debian-trixie-purls.txt | A Debian container image |\n| django | 5.2.5 | https://files.pythonhosted.org/packages/py3/d/django/django-5.2.5-py3-none-any.whl | django-5.2.5-whl-purls.txt | A Python package|\n| fiber | 2.52.9 | https://github.com/gofiber/fiber/archive/refs/tags/v2.52.9.tar.gz | fiber-2.52.9-purls.txt | A go package |\n| gin | 1.10.1 | https://github.com/gin-gonic/gin/archive/refs/tags/v1.10.1.tar.gz | gin-1.10.1-purls.txtt | Another go package |\n| sentry | 25.7.0 | https://github.com/getsentry/sentry/archive/refs/tags/25.7.0.tar.gz | sentry-25.7.0-purls.txt | A Python and Rust application |\n| ubuntu | 24.04 | docker://ubuntu:noble | ubuntu-24.04-purls.txt | A Ubuntu container image |\n\n\n\n## PURL Benchmark Automation in ScanCode.io\n\nUsing the expected PURLs file list and an SBOM, follow these instructions to run a benchmark:\n\nhttps://scancodeio.readthedocs.io/en/latest/built-in-pipelines.html#benchmark-purls-addon\n\n\nTo run a benchmark you can use:\n\n1. The expected PURLs files stored in the `expected-PURLs` directory.\n2. Corresponding pre-computed sample CycloneDX SBOMs from the `sample-SBOMs` directory.\n\n\n","funding_links":["https://github.com/sponsors/aboutcode-org","https://opencollective.com/aboutcode","https://causes.benevity.org/causes/056-5528680976845_a486","Benevity"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faboutcode-org%2Fpurl-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faboutcode-org%2Fpurl-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faboutcode-org%2Fpurl-benchmarks/lists"}