{"id":35554054,"url":"https://github.com/wfrisch/idlib","last_synced_at":"2026-03-15T13:58:38.998Z","repository":{"id":233179417,"uuid":"785094170","full_name":"wfrisch/idlib","owner":"wfrisch","description":"Identify embedded C and C++ libraries","archived":false,"fork":false,"pushed_at":"2026-02-08T07:13:39.000Z","size":117,"stargazers_count":3,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-08T14:51:22.061Z","etag":null,"topics":["code-duplication","git","indexer","sbom"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wfrisch.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-04-11T07:23:56.000Z","updated_at":"2025-08-12T11:36:48.000Z","dependencies_parsed_at":"2024-04-15T13:28:44.279Z","dependency_job_id":"60784b78-175b-4a14-89ca-ab3120112243","html_url":"https://github.com/wfrisch/idlib","commit_stats":null,"previous_names":["wfrisch/idlib"],"tags_count":134,"template":false,"template_full_name":null,"purl":"pkg:github/wfrisch/idlib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfrisch%2Fidlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfrisch%2Fidlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfrisch%2Fidlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfrisch%2Fidlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wfrisch","download_url":"https://codeload.github.com/wfrisch/idlib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfrisch%2Fidlib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29965419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T06:55:38.174Z","status":"ssl_error","status_checked_at":"2026-03-01T06:53:04.810Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-duplication","git","indexer","sbom"],"created_at":"2026-01-04T08:16:36.964Z","updated_at":"2026-03-01T09:01:49.488Z","avatar_url":"https://github.com/wfrisch.png","language":"Python","readme":"# idlib\n\nidlib finds embedded copies of C/C++ libraries in open-source packages.\n\n## Usage\n```\n# Download the latest database\nwget -O- https://github.com/wfrisch/idlib/releases/latest/download/idlib.sqlite.zst |zstdcat \u003e idlib.sqlite\n\n# Scan a package\n./identify.py -s cmake-3.29.0/\n\n# Output\ncurl curl-8_5_0-248-g066ed4e51\nxz v5.3.1alpha-65-g7136f173\nlibuv v1.44.0-5-gbae2992c\nzlib v1.2.12-30-ga9e14e8\nzstd 0^20210512.c730b8c5a38b9e93efc0c3639e26f18f14b82f95\n```\n\nWe maintain a [weekly scan of openSUSE](https://github.com/wfrisch/idlib_results/blob/main/openSUSE.csv).\n\n## Implementation\nAt its core, idlib relies on a lookup table that associates a file's SHA-256\nhash with metadata from the respective library's git repository:\n\n```\nCREATE TABLE IF NOT EXISTS files (\n    sha256      TEXT,\n    library     TEXT,  -- name of the library\n    commit_hash TEXT,  -- git commit that introduced this version\n    commit_time TEXT,  -- git commit timestamp (ISO 8601 format)\n    commit_desc TEXT,  -- git describe for this commit,\n                       -- ... falls back to: 0^{date}.{commit_hash}\n    path        TEXT,  -- file path at the time of the matched commit\n    size        INTEGER\n);\n```\n\n### Indexer (`index.py`)\nThe indexer generates this database from a list of\n[configured libraries](config.py).\n\n```\n$ ./index.py -h\nusage: index.py [-h] [-d DB] [-l LIBRARY] [-p] [-m {sparse,full}] [-v]\n\noptions:\n  -h, --help            show this help message and exit\n  -d DB                 database path. Default: ./idlib.sqlite\n  -l LIBRARY, --library LIBRARY\n                        index only a specific library\n  -p, --prune-only      don't index, only prune database\n  -m {sparse,full}, --mode {sparse,full}\n                        index mode (default: sparse)\n  -v, --verbose\n```\n\nIt has two modes:\n\n* sparse mode (default)\n* full mode\n\n#### Sparse mode\nIn sparse mode, only a hand-picked set of files is considered for indexing. The\nidea is to improve the signal/noise ratio by choosing files that a) are unique\nto the library, b) are unlikely to be omitted in a copy.\n\nFor each configured file, run `git log --follow`\n  - For each commit:\n    - store metadata (commit hash, time, `git describe`)\n    - For each file modified by the commit:\n      - store SHA-256(commit:path)\n\nAdvantages:\n- Compact database\n- Low false positive rate\n\nDisadvantages:\n- Less accurate version identification\n\n#### Full mode\nIn full mode, all files in all commits are indexed.\n\nAdvantages:\n- More accurate version identification\n\nDisadvantages:\n- Large database\n- False positives likely, unless the client filters the results, for example by\n  only considering .c/.cpp matches\n\n#### Pruning\nIn both modes the indexer prunes the database after indexing:\n- Remove empty files\n- Remove embedded copies of other libraries, for example libpng embeds zlib.\n  - Remove all hashes in libpng that also exist zlib.\n- Remove remaining inter-library duplicates, usually stuff like\n  - license files\n  - standard .gitignore files\n  - build system artifacts\n  - etc\n\nThe result is a database where no hash ever points to more than one library.\nDuplicates within a library are kept, though.\n\nIf the sparse mode is configured properly, prune() shouldn't find anything.\n\n### Client\nThe client (`identify.py`) hashes all C/C++ files in a directory and looks up\nthe respective database entries.\n\n```\nusage: identify.py [-h] [-d DB] [-s] directory\n\nIdentify embedded open-source libraries\n\npositional arguments:\n  directory        directory containing the source code to search\n\noptions:\n  -h, --help       show this help message and exit\n  -d DB            database path. Default: ./idlib.sqlite\n  -s, --summarize  don't report individual files, just the detected libs and their most probable version respectively.\n```\n\n```\n# ./identify.py cmake-3.29.2\ncurl        curl-8_5_0-125-ge556470         Utilities/cmcurl/lib/connect.c\ncurl        curl-8_5_0-45-g3829759          Utilities/cmcurl/lib/cookie.c\ncurl        curl-8_5_0-138-gcfe7902         Utilities/cmcurl/lib/easy.c\ncurl        curl-8_5_0-238-ga6c9a33         Utilities/cmcurl/lib/file.c\ncurl        curl-8_5_0-45-g3829759          Utilities/cmcurl/lib/formdata.c\ncurl        curl-8_5_0-40-g907eea0          Utilities/cmcurl/lib/hostip.c\ncurl        curl-8_5_0-248-g066ed4e         Utilities/cmcurl/lib/http.c\ncurl        curl-8_5_0-123-ga0f9480         Utilities/cmcurl/lib/http2.c\ncurl        curl-8_5_0-216-gc2d9736         Utilities/cmcurl/lib/imap.c\ncurl        curl-8_5_0-167-gd7b6ce6         Utilities/cmcurl/lib/ldap.c\ncurl        curl-8_5_0-167-gd7b6ce6         Utilities/cmcurl/lib/multi.c\ncurl        curl-8_5_0-216-gc2d9736         Utilities/cmcurl/lib/pop3.c\ncurl        curl-8_5_0-182-g3378d2b         Utilities/cmcurl/lib/sendf.c\ncurl        curl-8_5_0-216-gc2d9736         Utilities/cmcurl/lib/smtp.c\ncurl        curl-8_5_0-227-g0c05b8f         Utilities/cmcurl/lib/telnet.c\ncurl        curl-8_5_0-167-gd7b6ce6         Utilities/cmcurl/lib/tftp.c\ncurl        curl-8_5_0-196-gcdd905a         Utilities/cmcurl/lib/transfer.c\ncurl        curl-8_5_0-178-gc5801a2         Utilities/cmcurl/lib/url.c\ncurl        curl-8_5_0-167-gd7b6ce6         Utilities/cmcurl/lib/urldata.h\ncurl        curl-8_5_0-161-g5d044ad         Utilities/cmcurl/lib/vquic/curl_ngtcp2.c\ncurl        curl-8_5_0-230-g6d85228         Utilities/cmcurl/lib/vssh/libssh2.c\ncurl        curl-8_5_0-50-gaf520ac          Utilities/cmcurl/lib/vtls/gtls.c\ncurl        curl-8_5_0-106-gaff2608         Utilities/cmcurl/lib/vtls/schannel.c\ncurl        curl-8_5_0-158-gdd0f680         Utilities/cmcurl/lib/vtls/sectransp.c\ncurl        curl-8_5_0-237-g9a90c9d         Utilities/cmcurl/lib/vtls/vtls.c\nlibarchive  v3.6.1-39-gd6248d2              Utilities/cmlibarchive/libarchive/archive_entry.c\nlibarchive  v3.5.2-26-g4b7558e              Utilities/cmlibarchive/libarchive/archive_read.c\nlibarchive  v3.5.2-26-g2a8bb42              Utilities/cmlibarchive/libarchive/archive_read_disk_entry_from_file.c\nlibarchive  v3.6.2-47-g2aa73f8              Utilities/cmlibarchive/libarchive/archive_read_disk_windows.c\nlibarchive  v3.7.0-14-gcc4147e              Utilities/cmlibarchive/libarchive/archive_read_support_format_lha.c\nlibarchive  v3.6.2-33-ge605604              Utilities/cmlibarchive/libarchive/archive_read_support_format_mtree.c\nlibarchive  v3.6.1-84-ge2f7c1d              Utilities/cmlibarchive/libarchive/archive_read_support_format_tar.c\nlibarchive  v3.6.2-7-g0348e24               Utilities/cmlibarchive/libarchive/archive_read_support_format_warc.c\nlibarchive  v3.6.2-45-g35b79b0              Utilities/cmlibarchive/libarchive/archive_string.c\nlibarchive  v3.6.2-48-g9e1081b              Utilities/cmlibarchive/libarchive/archive_windows.c\nlibarchive  v3.6.2-58-g092631c              Utilities/cmlibarchive/libarchive/archive_write_disk_windows.c\nlibarchive  v3.4.2-11-gfe465c0              Utilities/cmlibarchive/libarchive/archive_write_set_format_mtree.c\nlibarchive  v3.7.1-9-g1b4e0d0               Utilities/cmlibarchive/libarchive/archive_write_set_format_pax.c\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/ascii.h\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/asciitab.h\nlibexpat    R_2_4_5-7-g28f7454              Utilities/cmexpat/lib/expat.h\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/iasciitab.h\nlibexpat    R_2_3_0-88-g5dbc857             Utilities/cmexpat/lib/internal.h\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/latin1tab.h\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/nametab.h\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/utf8tab.h\nlibexpat    R_2_4_5-7-g28f7454              Utilities/cmexpat/lib/xmlparse.c\nlibexpat    R_2_4_4-2-g317c917              Utilities/cmexpat/lib/xmlrole.c\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/xmlrole.h\nlibexpat    R_2_4_4-24-gfdbd69b             Utilities/cmexpat/lib/xmltok.c\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/xmltok.h\nlibexpat    R_2_4_4-24-gfdbd69b             Utilities/cmexpat/lib/xmltok_impl.c\nlibexpat    R_2_3_0-55-gdf42f93             Utilities/cmexpat/lib/xmltok_impl.h\nlibexpat    R_2_4_2-31-g6496a03             Utilities/cmexpat/lib/xmltok_ns.c\nlibuv       v1.44.0-5-gbae2992              Utilities/cmlibuv/src/uv-common.h\nlibuv       v1.43.0-47-gc40f8cb             Utilities/cmlibuv/src/unix/linux-core.c\nnghttp2     v1.47.0-81-gb0fbb93             Utilities/cmnghttp2/lib/nghttp2_frame.c\nnghttp2     v1.47.0-79-gc10a555             Utilities/cmnghttp2/lib/nghttp2_hd.c\nnghttp2     v1.49.0-5-geb06e33              Utilities/cmnghttp2/lib/nghttp2_session.c\nnghttp2     v1.49.0-5-geb06e33              Utilities/cmnghttp2/lib/nghttp2_session.h\nnghttp2     v1.50.0-25-g3f65ab7             Utilities/cmnghttp2/lib/includes/nghttp2/nghttp2.h\nxz          v5.2.4-15-g0d31840              Utilities/cmliblzma/liblzma/api/lzma/block.h\nxz          v5.3.1alpha-22-g2fb0dda         Utilities/cmliblzma/liblzma/api/lzma/block.h\nxz          v5.2.2-33-ge013a33              Utilities/cmliblzma/liblzma/common/block_decoder.c\nxz          v5.2.1-60-gd4a0462              Utilities/cmliblzma/liblzma/common/block_decoder.c\nxz          v5.2.1-60-gd4a0462              Utilities/cmliblzma/liblzma/common/block_encoder.c\nxz          v5.2.2-33-ge013a33              Utilities/cmliblzma/liblzma/common/block_encoder.c\nxz          v5.2.3-2-gef36c63               Utilities/cmliblzma/liblzma/common/stream_decoder.c\nxz          v5.2.1-64-g84462af              Utilities/cmliblzma/liblzma/common/stream_decoder.c\nxz          v5.2.2-33-ge013a33              Utilities/cmliblzma/liblzma/common/stream_encoder.c\nxz          v5.2.1-60-gd4a0462              Utilities/cmliblzma/liblzma/common/stream_encoder.c\nxz          v5.2.4-37-g0e3c400              Utilities/cmliblzma/liblzma/lz/lz_decoder.c\nxz          v5.3.1alpha-48-g608517b         Utilities/cmliblzma/liblzma/lz/lz_decoder.c\nxz          v5.2.2-33-ge013a33              Utilities/cmliblzma/liblzma/lz/lz_encoder.c\nxz          v5.2.1-60-gd4a0462              Utilities/cmliblzma/liblzma/lz/lz_encoder.c\nxz          v5.2.4-50-g00517d1              Utilities/cmliblzma/liblzma/lzma/lzma_decoder.c\nxz          v5.3.1alpha-65-g7136f17         Utilities/cmliblzma/liblzma/lzma/lzma_decoder.c\nxz          v5.2.4-50-g00517d1              Utilities/cmliblzma/liblzma/lzma/lzma_encoder.c\nxz          v5.3.1alpha-65-g7136f17         Utilities/cmliblzma/liblzma/lzma/lzma_encoder.c\nxz          v5.1.1alpha-70-g1403707         Utilities/cmliblzma/liblzma/rangecoder/range_decoder.h\nzlib        v1.2.9                          Utilities/cmzlib/adler32.c\nzlib        v1.2.12-21-g84c6716             Utilities/cmzlib/compress.c\nzlib        v1.2.12-31-g888b3da             Utilities/cmzlib/crc32.c\nzlib        v1.2.11-38-gf8719f5             Utilities/cmzlib/crc32.h\nzlib        v1.2.12-2-g3df8424              Utilities/cmzlib/deflate.h\nzlib        v1.2.3.9                        Utilities/cmzlib/gzclose.c\nzlib        v1.2.12                         Utilities/cmzlib/gzguts.h\nzlib        v1.2.12-21-g84c6716             Utilities/cmzlib/gzlib.c\nzlib        v1.2.11-11-g60a5ecc             Utilities/cmzlib/inffast.c\nzlib        v1.2.5                          Utilities/cmzlib/inffast.h\nzlib        v1.2.5.1-28-g518ad01            Utilities/cmzlib/inffixed.h\nzlib        v1.2.12-30-ga9e14e8             Utilities/cmzlib/inflate.c\nzlib        v1.2.12                         Utilities/cmzlib/inflate.h\nzlib        v1.2.13                         Utilities/cmzlib/inftrees.c\nzlib        v1.2.12-14-g5752b17             Utilities/cmzlib/inftrees.h\nzlib        v1.2.12-21-g84c6716             Utilities/cmzlib/trees.c\nzlib        v1.2.4.5                        Utilities/cmzlib/trees.h\nzlib        v1.2.12-21-g84c6716             Utilities/cmzlib/uncompr.c\nzlib        v1.2.12-25-gd0704a8             Utilities/cmzlib/zutil.c\nzstd        v1.4.7-356-gc730b8c             Utilities/cmzstd/lib/zstd.h\nzstd        v1.4.7-226-ga494308             Utilities/cmzstd/lib/common/fse.h\nzstd        v1.4.7-197-gde9de86             Utilities/cmzstd/lib/common/fse.h\nzstd        v1.4.7-236-g550f76f             Utilities/cmzstd/lib/common/zstd_internal.h\nzstd        v1.4.7-334-g9e94b7c             Utilities/cmzstd/lib/compress/zstd_compress.c\nzstd        v1.4.7-251-g6cee3c2             Utilities/cmzstd/lib/decompress/zstd_decompress.c\n```\n\nIn summarize mode (`-s`), it groups the matches by library and shows the\ndescription of the latest match respectively (sorted by git commit timestamp).\n\n```\n# ./identify.py -s cmake-3.29.2\ncurl curl-8_5_0-248-g066ed4e\nlibarchive v3.7.1-9-g1b4e0d0\nlibexpat R_2_4_5-7-g28f7454\nlibuv v1.44.0-5-gbae2992\nnghttp2 v1.50.0-25-g3f65ab7\nxz v5.2.4-50-g00517d1\nzlib v1.2.13\nzstd v1.4.7-356-gc730b8c\n```\n\n## Adding new libraries\nRough outline\n```\ncd libraries\ngit submodule add https://github.com/abc/libxyz\ncd ..\n./metric.py -n 40 libraries/libxyz/\n\n# add the library definition including the list of sparse files generated by metric.py\n$EDITOR config.py\n\n# test the sparse configuration\n./index.py -m sparse -d idlib.sqlite -l libxyz\n\n# test the full configuration\n# then pay attention to duplicated files:\n# libxyz might embed other indexed libraries. adjust config.py accordingly.\n./index.py -m full -d idlib-full.sqlite -l libxyz\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwfrisch%2Fidlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwfrisch%2Fidlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwfrisch%2Fidlib/lists"}